0

I have XML like this:

<?xml version="1.0" ?>
<iq id="123" to="test" type="result">
    <query xmlns="jabber:iq:roster">
        <item jid="foo" subscription="both"/>
        <item jid="bar" subscription="both"/>
    </query>
</iq>

And I would like to parse jid from item into array. I thought something like this would work

import xml.etree.ElementTree as ET

myarr = []

xml = '<?xml version="1.0" ?><iq id="123" to="test" type="result"><query xmlns="jabber:iq:roster"><item jid="foo" subscription="both"/><item jid="bar" subscription="both"/></query></iq>'

root = ET.fromstring(xml)

for item in root.findall('query'):
    t = item.get('jid')
    myarr.append(t)
    print (t)

2 Answers 2

1

You need to handle namespaces. One option would to paste the namespace into the xpath expression:

for item in root.findall('.//{%(ns)s}query/{%(ns)s}item' % {'ns': 'jabber:iq:roster'}):
    t = item.attrib.get('jid')
    myarr.append(t)
    print (t)

Prints:

foo
bar

See also:

Sign up to request clarification or add additional context in comments.

2 Comments

While it's less general, the iterator could also be simplified to: root.findall('.//{jabber:iq:roster}item')
Thanks, I had no experience with namespaces in XML before. Now, it's quite clear
1

I endorse @alecxe's approach, which I will label "handle the namespaces." That is the most general and correct approach. Unfortunately, namespaces are often ugly, wordy, and they needlessly complexity XPath expressions.

For the many simple cases where namespaces are an artifact of the XML world's desire for über-precision and not truly necessary to identify the nodes in a document, a simpler "eliminate the namespaces" alternative allows more concise searches. The key routine is:

def strip_namespaces(tree):
    """
    Strip the namespaces from an ElementTree in order to make
    processing easier. Adapted from @nonagon's answer
    at http://stackoverflow.com/a/25920989/240490
    """
    for el in tree.iter():
        if '}' in el.tag:
            el.tag = el.tag.split('}', 1)[1]  # strip namespaces
        for k, v in el.attrib.items():
            if '}' in k:
                newkey = k.split('}', 1)[1]
                el.attrib[newkey] = v
            del el.attrib[k]
    return tree

Then the program continues much as before, but without worrying about those pesky namespaces:

root = ET.fromstring(xml)
strip_namespaces(root)

for item in root.findall('.//item'):
    t = item.attrib.get('jid')
    myarr.append(t)
    print (t)

This is not effective if you are trying to modify the ElementTree and re-emit XML, but if you're just trying to deconstruct and grab data from the tree, it works well.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.