Processing data from XML tags in python

Question

I am trying to extract data from an XML document using python.

The tool I'm currently trying with and seems like it is a stable choice is lxml.

The issue I'm having is that the tutorials and questions I have came across all assume the format of the XML document is as follows:

<note> 
   <to>Tove</to> 
   <from>Jani</from> 
   <heading>Reminder</heading> 
   <body>Don't forget me this weekend!</body> 
</note>

With the values inside the XML tags.

However - the document I am trying to extract from has values inside elements of the tags, like so:

<note> 
   <to id="16" name="Tove"/>
   <from id="341" name"Jani"/> 
   <heading id="1" name="Reminder"/> 
   <body id="2" name="Don't forget me this weekend!"/> 
</note>

The way I have tried doing this in LXML is this:

xml_file = lxml.etree.parse("test.xml")

notes = xml_file.xpath("//note")

for note in notes:
    note_id = note.find("id").text
    print note_id

This just returns "None"

I have now found that the .text is what gets data from inside the XML tags - However I simply can't find how to get the data from the elements shown above.

Could anyone point me in the right direction?

bluszcz · Accepted Answer · 2017-10-25 20:54:16Z

1

To access the attributes you should use an attrib:

xml_file = lxml.etree.parse("test.xml")

notes = xml_file.xpath("//note")

for note in notes:
    print [ x.attrib for x in note.getchildren() ]

More reading: http://lxml.de/tutorial.html#elements-carry-attributes-as-a-dict

edited Oct 25, 2017 at 20:54

answered Oct 25, 2017 at 20:43

bluszcz

4,1244 gold badges36 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

drew Over a year ago

I am trying this and I get the error - 'NoneType' object has no attribute 'attrib' -

bluszcz Over a year ago

Right, I made mistake - fixed the code how to get attributes for tag "to".

bluszcz Over a year ago

Actually I have made another edit - you can iterate over children using getchildren and then get attributes.

drew Over a year ago

Ah brilliant that has worked - You should leave the original edit to your answer as well for anyone that may stumble upon this post as that was ver helpful. as is this edit with the getchildren function. +1

Collectives™ on Stack Overflow

Processing data from XML tags in python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related