Python: Ignore xmlns in elementtree.ElementTree

Question

Is there a way to ignore the XML namespace in tage names in elementtree.ElementTree?

I try to print all technicalContact tags:

for item in root.getiterator(tag='{http://www.example.com}technicalContact'):
        print item.tag, item.text

And I get something like:

{http://www.example.com}technicalContact [email protected]

But what I really want is:

technicalContact [email protected]

Is there a way to display only the suffix (sans xmlns), or better - iterate over the elements without explicitly stating xmlns?

See my answer under stackoverflow.com/a/25920989/2593383 for a more general solution — nonagon
– nonagon, Commented Sep 18, 2014 at 19:39
see also: Python ElementTree module: How to ignore the namespace of XML files — milahu
– milahu, Commented Jul 3, 2023 at 6:49

Chris · Accepted Answer · 2012-06-27 13:25:56Z

8

You can define a generator to recursively search through your element tree in order to find tags which end with the appropriate tag name. For example, something like this:

def get_element_by_tag(element, tag):
    if element.tag.endswith(tag):
        yield element
    for child in element:
        for g in get_element_by_tag(child, tag):
            yield g

This just checks for tags which end with tag, i.e. ignoring any leading namespace. You can then iterate over any tag you want as follows:

for item in get_element_by_tag(elemettree, 'technicalContact'):
    ...

This generator in action:

>>> xml_str = """<root xmlns="http://www.example.com">
... <technicalContact>Test1</technicalContact>
... <technicalContact>Test2</technicalContact>
... </root>
... """

xml_etree = etree.fromstring(xml_str)

>>> for item in get_element_by_tag(xml_etree, 'technicalContact')
...     print item.tag, item.text
... 
{http://www.example.com}technicalContact Test1
{http://www.example.com}technicalContact Test2

answered Jun 27, 2012 at 13:25

Chris

46.7k17 gold badges140 silver badges160 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Chris Over a year ago

Hopefully the above answers the question. A difference I have noticed is that item in the generator example does not have a next method. Still, other than this it behaves in the same (similar?) way to etree.getiterator.

lebox · Accepted Answer · 2012-06-27 13:09:04Z

1

I always end up by using something like

item.tag.split("}")[1][0:]

edited Jun 27, 2012 at 13:09

answered Jun 27, 2012 at 13:00

lebox

563 bronze badges

3 Comments

Adam Matan Over a year ago

It does not address the iterator issue - I still have to iterate over the full tag name.

lebox Over a year ago

I am not aware of any of the different xml handlers for python that do that. With lxml you could use a xlst on the xml before you parse it.

C0deH4cker Over a year ago

The [0:] is pointless. If you are trying to get a copy of it so as not to change the original you can simply do [:]. Or, if that isn't a problem, just remove the [0:] altogether.

Collectives™ on Stack Overflow

Python: Ignore xmlns in elementtree.ElementTree

2 Answers 2

1 Comment

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related