Parsing XML using etree in Python

Question

I have looked at the documentation and other similar questions and can't work out what's going wrong here!

I want to use the XML output from an API.

I have XML that looks a bit like this:

<response>
<lst></lst>
<result>
    <doc>
        <str name ="pa">1234</str>
        <str name ="et">Title 1</str>
        <str name ="pb">Publisher 1</str>
        <str name ="ur">http://www.exampleone.com</str>
    </doc>
    <doc>
        <str name ="pa">5678</str>
        <str name ="et">Title 2</str>
        <str name ="pb">Publisher 2</str>
        <str name ="ur">http://www.exampletwo.com</str>
    </doc>
</result>

I want to get the "pa" for each doc element.

This is the code I am using, but get nothing:

import requests
import xml.etree.ElementTree as ET

r = requests.get("api url goes here")

tree = ET.fromstring(r.content)

for doc in tree.findall("doc"):
    pan = doc.find('pa').text
    print pan

What am I doing wrong?

alecxe · Accepted Answer · 2015-03-27 16:59:26Z

3

doc.find('pa') would search for the pa element, which doesn't exist.

Instead, you need to look for str element with name attribute equal to pa:

doc.find('.//str[@name="pa"]')

Demo:

>>> for doc in tree.findall("doc"):
...     pan = doc.find('.//str[@name="pa"]').text
...     print pan
... 
1234
5678

answered Mar 27, 2015 at 16:59

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Abbie Over a year ago

Thanks for the suggestion. I think I tried this before. I have tried it again and I still don't get any results.

alecxe Over a year ago

@Abbie well, it worked for me using the provided example. I suspect the XML you've provided is not what you actually get from the API request.

rparent Over a year ago

In the xml sample that you provided in your question, you are missing a </response> at the end.

alecxe Over a year ago

@rparent if it'd be really missing, it would result into an error while parsing the XML thrown by elementtree.

alecxe Over a year ago

My guess it's either a different response retrieved from the API endpoint, or there are namespaces defined.

|

chapelo · Accepted Answer · 2015-03-27 17:49:38Z

0

This should work...

import xml.etree.ElementTree as ET

resp = '''<response><lst></lst><result><doc>
            <str name ="pa">1234</str>
            <str name ="et">Title 1</str>
            <str name ="pb">Publisher 1</str>
            <str name ="ur">http://www.exampleone.com</str>
          </doc>
          <doc>
            <str name ="pa">5678</str>
            <str name ="et">Title 2</str>
            <str name ="pb">Publisher 2</str>
            <str name ="ur">http://www.exampletwo.com</str>
          </doc></result></response>'''

tree = ET.fromstring(resp)

for pan in tree.findall('.//str[@name="pa"]'):
    print(pan.text)

With your code, it needs minor changes to work:

Either:

for doc in tree[1].findall("doc"):
    pan = doc.find('str[@name="pa"]').text
    print (pan)

Or

for doc in tree.findall(".//doc"):
    pan = doc.find('str[@name="pa"]').text
    print (pan)

answered Mar 27, 2015 at 17:49

chapelo

2,56216 silver badges19 bronze badges

1 Comment

Abbie Over a year ago

Thank you, I used the second example you gave and it produced the result I was after.

Collectives™ on Stack Overflow

Parsing XML using etree in Python

2 Answers 2

6 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related