I am trying to extract data from an XML document using python.
The tool I'm currently trying with and seems like it is a stable choice is lxml.
The issue I'm having is that the tutorials and questions I have came across all assume the format of the XML document is as follows:
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
With the values inside the XML tags.
However - the document I am trying to extract from has values inside elements of the tags, like so:
<note>
<to id="16" name="Tove"/>
<from id="341" name"Jani"/>
<heading id="1" name="Reminder"/>
<body id="2" name="Don't forget me this weekend!"/>
</note>
The way I have tried doing this in LXML is this:
xml_file = lxml.etree.parse("test.xml")
notes = xml_file.xpath("//note")
for note in notes:
note_id = note.find("id").text
print note_id
This just returns "None"
I have now found that the .text is what gets data from inside the XML tags - However I simply can't find how to get the data from the elements shown above.
Could anyone point me in the right direction?