13

I am trying extract some data from a bunch of xml files. Now, the issue is the structure of all the files is not exactly the same and thus, just iterating over the children and extracting the values is difficult.

Is there a getElementByTag() method for python for such xml documents? I have seen that such a method is available for C#, C++ users but couldn't find anything for Python.

Any help will be much appreciated!

0

1 Answer 1

23

Yes, in the package xml.etree you can find the built-in function related to XML. (also available for python2)

The one specifically you are looking for is findall.

For example:

import xml.etree.ElementTree as ET
tree = ET.fromstring(some_xml_data)
all_name_elements = tree.findall('.//name')

With:

In [1]: some_xml_data = "<help><person><name>dean</name></person></help>"

I get the following:

In [10]: tree.findall(".//name")
Out[10]: [<Element 'name' at 0x7ff921edd390>]
Sign up to request clarification or add additional context in comments.

8 Comments

findall only searches at the children level. However, I was looking for something that goes all the way to the bottom of the tree.
If you use findAll for the root element of the tree, it searches all subelements. You can also use it on the ElementTree object, instead of the root element, and then it also searches the root.
That does not work for me. It only searches the child level and nothing below that. Also, your syntax is incorrect in the answer you posted. Thanks!
@codepi You're right. Got it wrong. I edited with a fix.
@DeanFenster I believe the correct syntax should be ".//name" in order to get any element named "name". "*/name" will only return grandchildren of the element.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.