1

I have looked at the documentation and other similar questions and can't work out what's going wrong here!

I want to use the XML output from an API.

I have XML that looks a bit like this:

<response>
<lst></lst>
<result>
    <doc>
        <str name ="pa">1234</str>
        <str name ="et">Title 1</str>
        <str name ="pb">Publisher 1</str>
        <str name ="ur">http://www.exampleone.com</str>
    </doc>
    <doc>
        <str name ="pa">5678</str>
        <str name ="et">Title 2</str>
        <str name ="pb">Publisher 2</str>
        <str name ="ur">http://www.exampletwo.com</str>
    </doc>
</result>

I want to get the "pa" for each doc element.

This is the code I am using, but get nothing:

import requests
import xml.etree.ElementTree as ET

r = requests.get("api url goes here")

tree = ET.fromstring(r.content)

for doc in tree.findall("doc"):
    pan = doc.find('pa').text
    print pan

What am I doing wrong?

2 Answers 2

3

doc.find('pa') would search for the pa element, which doesn't exist.

Instead, you need to look for str element with name attribute equal to pa:

doc.find('.//str[@name="pa"]')

Demo:

>>> for doc in tree.findall("doc"):
...     pan = doc.find('.//str[@name="pa"]').text
...     print pan
... 
1234
5678
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for the suggestion. I think I tried this before. I have tried it again and I still don't get any results.
@Abbie well, it worked for me using the provided example. I suspect the XML you've provided is not what you actually get from the API request.
In the xml sample that you provided in your question, you are missing a </response> at the end.
@rparent if it'd be really missing, it would result into an error while parsing the XML thrown by elementtree.
My guess it's either a different response retrieved from the API endpoint, or there are namespaces defined.
|
0

This should work...

import xml.etree.ElementTree as ET

resp = '''<response><lst></lst><result><doc>
            <str name ="pa">1234</str>
            <str name ="et">Title 1</str>
            <str name ="pb">Publisher 1</str>
            <str name ="ur">http://www.exampleone.com</str>
          </doc>
          <doc>
            <str name ="pa">5678</str>
            <str name ="et">Title 2</str>
            <str name ="pb">Publisher 2</str>
            <str name ="ur">http://www.exampletwo.com</str>
          </doc></result></response>'''

tree = ET.fromstring(resp)

for pan in tree.findall('.//str[@name="pa"]'):
    print(pan.text)

With your code, it needs minor changes to work:

Either:

for doc in tree[1].findall("doc"):
    pan = doc.find('str[@name="pa"]').text
    print (pan)

Or

for doc in tree.findall(".//doc"):
    pan = doc.find('str[@name="pa"]').text
    print (pan)

1 Comment

Thank you, I used the second example you gave and it produced the result I was after.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.