1

Using Python lxml library, I'm trying to parse a XML document as follows:

<ns:searchByScientificNameResponse xmlns:ns="http://itis_service.itis.usgs.gov">
<ns:return xmlns:ax21="http://data.itis_service.itis.usgs.gov/xsd" xmlns:ax23="http://metadata.itis_service.itis.usgs.gov/xsd" xmlns:ax26="http://itis_service.itis.usgs.gov/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ax21:SvcScientificNameList">
<ax21:scientificNames xsi:type="ax21:SvcScientificName">
<ax21:tsn>26339</ax21:tsn>
<ax21:author>L.</ax21:author>
<ax21:combinedName>Vicia faba</ax21:combinedName>
<ax21:kingdom>Plantae</ax21:kingdom>
<ax21:unitInd1 xsi:nil="true" />
<ax21:unitInd2 xsi:nil="true" />
<ax21:unitInd3 xsi:nil="true" />
<ax21:unitInd4 xsi:nil="true" />
<ax21:unitName1>Vicia</ax21:unitName1>
<ax21:unitName2>faba</ax21:unitName2>
<ax21:unitName3 xsi:nil="true" />
<ax21:unitName4 xsi:nil="true" />
</ax21:scientificNames>
</ns:return>
</ns:searchByScientificNameResponse>

Specifically, I want to get the value of the "ax21:tsn" element (in this case, the integer 26339).

I tried the answers from here and here, without success. Here is my code:

import lxml.etree as ET

tree = ET.parse("sample.xml")
#print(ET.tostring(tree))

namespaces = {'ax21': 'http://data.itis_service.itis.usgs.gov/xsd'} 
tsn = tree.find('scientificNames/tsn', namespaces)
print(tsn)

It just returns nothing. It there a really intelligent way of doing this using xpath?

0

1 Answer 1

2

Two problems:

  1. scientificNames is not a direct child of the root element; it is a grandchild.

  2. You need to use the ax21 prefix in the XPath expression.

The following works:

tsn = tree.find('.//ax21:scientificNames/ax21:tsn', namespaces)

Or simply:

tsn = tree.find('.//ax21:tsn', namespaces)
Sign up to request clarification or add additional context in comments.

4 Comments

But it seems it does not work for "findall()".
Not sure what you mean. Note that findall returns a list while find returns a single element.
This returns an empty list: common_names = tree.findall(".//ax21:commonNames:ax21:commonName", namespaces)
Please post a new question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.