Parsing XML data within tags using lxml in python

Question

My question is regarding how to get information stored in a tag which allows for no closing tag. Here's the relevant xml:

<?xml version="1.0" encoding="UTF-8"?>
<uws:job>  
<uws:results>
    <uws:result id="2014-03-03T15:42:31:1337" xlink:href="http://www.cosmosim.org/query/index/stream/table/2014-03-03T15%3A42%3A31%3A1337/format/csv" xlink:type="simple"/>
</uws:results>
</uws:job>

I'm looking to extract the xlink:href url here. As you can see the uws:result tag requires no closing tag. Additionally, having the 'uws:' makes it a bit tricky to handle them when working in python. Here's what I've tried so far:

from lxml import etree
root = etree.fromstring(xmlresponse.content)
url = root.find('{*}results').text

Where xmlresponse.content is the xml data to be parsed. What this returns is

'\n    '

which indicates that it's only finding the newline character, since what I'm really after is contained within a tag inside the results tag. Any ideas would be greatly appreciated.

Provide the first xml declaration line of the xml.

alecxe
– alecxe

2014-03-03 19:14:32 +00:00
Commented Mar 3, 2014 at 19:14 — alecxe
– alecxe, Commented Mar 3, 2014 at 19:14
Added the declaration statement.

astromax
– astromax

2014-03-03 19:38:24 +00:00
Commented Mar 3, 2014 at 19:38 — astromax
– astromax, Commented Mar 3, 2014 at 19:38

Corley Brigman · Accepted Answer · 2014-03-03 19:28:04Z

2

You found the right node; you extracted the data incorrectly. Instead of

url = root.find('{*}results').text

you really want

url = root.find('{*}results').get('attribname', 'value_to_return_if_not_present')

or

url = root.find('{*}results').attrib['attribname']

(which will throw an exception if not present).

Because of the namespace on the attribute itself, you will probably need to use the {ns}attrib syntax to look it up too.

You can dump out the attrib dictionary and just copy the attribute name out too.

text is actually the space between elements, and is not normally used but is supported both for spacing (like etreeindent) and some special cases.

answered Mar 3, 2014 at 19:28

Corley Brigman

12.5k5 gold badges35 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Parsing XML data within tags using lxml in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related