0

I have some database like the next one in XML and im trying to parser it with Python 2.7:

<team>
    <generator>
        <team_name>TeamMaster</team_name>
        <team_year>2000</team_year>
        <team_city>NewYork</team_city>
    </generator>
    <players>
        <definition name="John V." number="4" age="25">
          <criteria position="fow" side="right">
            <criterion website="www.johnV.com" version="1" result="true"/>
          </criteria>
          <object debut="2003" version="3" flag="complete">
            <history item_ref="team34"/>
            <history item_ref="mainteam"/>
        </definition>
        <definition name="Emma" number="2" age="19">
          <criteria position="mid" side="left">
            <criterion website="www.emma.net" version="7" result="true"/>
          </criteria>
          <object debut="2008" version="1" flag="complete">
            <history item_ref="newteam"/>
            <history item_ref="youngteam"/>
            <history item_ref="oldteam"/>
        </definition>

    </players>
</team>

With this small scrip I can parse easily the first part "generator" from my xml, where I know all elements that contains:

from xml.dom.minidom import parseString

mydb = {
"team_name": ,
"team_year": ,
"team_data": 
}

file = open('mydb.xml','r')
data = file.read()
file.close()
dom = parseString(data)
#retrieve the first xml tag (<tag>data</tag>) that the parser finds with name tagName:
xmlTag = dom.getElementsByTagName('team_name')[0].toxml()
#strip off the tag (<tag>data</tag>  --->   data):
xmlData=xmlTag.replace('<team_name>','').replace('</team_name>','')

mydb["team_name"] = xmlData # TeamMaster

But my real problem came when I tried to parse the "players" elements, where attributes appears in "definition" and an unknown numbers of elements in "history". Maybe there is another module that would help me for this better than minidon?

1
  • 2
    Maybe this can assist you: XML Parsing with Python and minidom. -- "getElementsByTagName is recursive, you'll get all descendents with a matching tagName." Commented Apr 22, 2014 at 11:13

2 Answers 2

3

Better use xml.etree.ElementTree, it has a more pythonic syntax. Get the text of team_name by root.findtext('team_name') or iterate over all definitions with root.finditer('definitions').

Sign up to request clarification or add additional context in comments.

Comments

0

You can use either Element Tree - XML Parser or use BeautifulSoup XML Parser. I have created repo for usage of XML parser here XML Parsers Collection

Snippet code below:

    #Get the data from XML parser.
    users = xml_parser(users_file,'user') 

    #Iterate through root element.
    for user in users:
        print(user.find('country').text)
        print(user.find('city').text)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.