17

Is there a way to ignore the XML namespace in tage names in elementtree.ElementTree?

I try to print all technicalContact tags:

for item in root.getiterator(tag='{http://www.example.com}technicalContact'):
        print item.tag, item.text

And I get something like:

{http://www.example.com}technicalContact [email protected]

But what I really want is:

technicalContact [email protected]

Is there a way to display only the suffix (sans xmlns), or better - iterate over the elements without explicitly stating xmlns?

2

2 Answers 2

8

You can define a generator to recursively search through your element tree in order to find tags which end with the appropriate tag name. For example, something like this:

def get_element_by_tag(element, tag):
    if element.tag.endswith(tag):
        yield element
    for child in element:
        for g in get_element_by_tag(child, tag):
            yield g

This just checks for tags which end with tag, i.e. ignoring any leading namespace. You can then iterate over any tag you want as follows:

for item in get_element_by_tag(elemettree, 'technicalContact'):
    ...

This generator in action:

>>> xml_str = """<root xmlns="http://www.example.com">
... <technicalContact>Test1</technicalContact>
... <technicalContact>Test2</technicalContact>
... </root>
... """

xml_etree = etree.fromstring(xml_str)

>>> for item in get_element_by_tag(xml_etree, 'technicalContact')
...     print item.tag, item.text
... 
{http://www.example.com}technicalContact Test1
{http://www.example.com}technicalContact Test2
Sign up to request clarification or add additional context in comments.

1 Comment

Hopefully the above answers the question. A difference I have noticed is that item in the generator example does not have a next method. Still, other than this it behaves in the same (similar?) way to etree.getiterator.
1

I always end up by using something like

item.tag.split("}")[1][0:]

3 Comments

It does not address the iterator issue - I still have to iterate over the full tag name.
I am not aware of any of the different xml handlers for python that do that. With lxml you could use a xlst on the xml before you parse it.
The [0:] is pointless. If you are trying to get a copy of it so as not to change the original you can simply do [:]. Or, if that isn't a problem, just remove the [0:] altogether.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.