0

I want to print any text between pair of tags <en> as long as x='PERS', I tried that below, but the output was not what I wanted.

XML Sample

<Text>
<PHRASE>
<en x='PERS'> John </en>
<V> Went </V>
<prep> to </prep>
<V> meet </V>
<en x='PERS'> Alex </en>
</PHRASE>
<PHRASE>
<en x='PERS'> Mark </en>
<V> lives </V>
<prep> in </prep>
<en x='LOC'> Florida </en>
</PHRASE>
<PHRASE>
<en x='PERS'> Nick </en>
<V> visited</V>
<en x='PERS'> Anna </en>
</PHRASE>
</TEXT>

I want the output: John-Alex,Nick-Anna. but I got : Mark-Mark. Meaning that I only want to print 2 PERS when they appear in one phrase

This is the code I wrote, I used element tree.

import xml.etree.ElementTree as ET
tree = ET.parse('output.xml')
root = tree.getroot()
print("------------------------PERS-PERS-------------------------------")
PERS_PERScount=0
for phrase in root.findall('./PHRASE'):
    ens = {en.get('x'): en.text for en in phrase.findall('en')}
    if 'PERS' in ens and 'PERS' in ens:
        print("PERS is: {}, PERS is: {} /".format(ens["PERS"], ens["PERS"]))
        #print(ens["ORG"])
        #print(ens["PERS"])
        PERS_PERScount = PERS_PERScount + 1
print("Number of PERS-PERS relation", PERS_PERScount)

I am not sure if the problem is in print or the if condition, or both ?!

1
  • Your first "Text" tag does not match the last one (written 'TEXT'). You should make them both equal to avoid getting an error at parsing. Commented Mar 4, 2016 at 22:38

2 Answers 2

1

You can add a simple if check to increment and print, only when number of en element with attribute x equals "PERS" is 2 (a pair) :

for phrase in root.findall('./PHRASE'):
    # get all inner text of elements where `x` attribute equals `"PERS"`
    names = [p.text.strip() for p in phrase.findall('./en[@x="PERS"]')]

    # if therea are 2 of them, increment counter and print
    if len(names) == 2:
        PERS_PERScount += 1
        print('-'.join(names))

print("Number of PERS-PERS relation: ", PERS_PERScount)

eval.in demo

output :

John-Alex
Nick-Anna
Number of PERS-PERS relation:  2
Sign up to request clarification or add additional context in comments.

Comments

0

This:

#!/usr/bin/env python3

import xml.etree.ElementTree as ET

tree = ET.parse('output.xml')

root = tree.getroot()

print("------------------------PERS-PERS-------------------------------")

for phrase in root:
    if phrase.tag == 'PHRASE':
        collected_names = []
        for elt in phrase:
            if elt.tag == 'en':
                if 'x' in elt.attrib and elt.attrib['x'] == 'PERS':
                    collected_names += [elt.text]
        if len(collected_names) >= 2:
            print(collected_names[0] + " - " + collected_names[1])

will output:

$ ./test_script
------------------------PERS-PERS-------------------------------
 John  -  Alex 
 Nick  -  Anna 

but I'm not sure it's exactly the way you want it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.