parsing xml using python / elementree

Question

The xml I need to search specifies but does not use a namespace:

    <WRMHEADER xmlns="http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader" version="4.0.0.0">
    <DATA>
        <PROTECTINFO>
            <KEYLEN>16</KEYLEN>
            <ALGID>AESCTR</ALGID>
        </PROTECTINFO>

        <LA_URL>http://192.168.8.33/license/rightsmanager.asmx</LA_URL>
        <LUI_URL>http://192.168.8.33/license/rightsmanager.asmx</LUI_URL>

        <DS_ID></DS_ID>
        <KID></KID>
        <CHECKSUM></CHECKSUM>

    </DATA>
</WRMHEADER>

I'd like to read the values for various fields, e.g. data/protectinfo/keylen etc.

root    = ET.fromstring(sMyXml)
keylen  = root.findall('./DATA/PROTECTINFO/KEYLEN')

print root
print keylen

This code prints the following:

<Element {http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader}WRMHEADER at 0x7f2a7c35be60>
[]

root.find and root.findall return None or [] for this query. I've been unable to specify a default namespace, is there a solution to querying these values? thanks

Padraic Cunningham · Accepted Answer · 2016-06-21 11:54:38Z

1

Create a namespace dict:

x = """<WRMHEADER xmlns="http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader" version="4.0.0.0">
    <DATA>
        <PROTECTINFO>
            <KEYLEN>16</KEYLEN>
            <ALGID>AESCTR</ALGID>
        </PROTECTINFO>

        <LA_URL>http://192.168.8.33/license/rightsmanager.asmx</LA_URL>
        <LUI_URL>http://192.168.8.33/license/rightsmanager.asmx</LUI_URL>

        <DS_ID></DS_ID>
        <KID></KID>
        <CHECKSUM></CHECKSUM>

    </DATA>
</WRMHEADER>"""
from xml.etree import ElementTree as ET

root = ET.fromstring(x)
ns = {"wrm":"http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader"}
keylen = root.findall('wrm:DATA', ns)

print root
print keylen

Now you should get something like:

<Element '{http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader}WRMHEADER' at 0x7fd0a30d45d0>
[<Element '{http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader}DATA' at 0x7fd0a30d4610>]

To get /DATA/PROTECTINFO/KEYLEN:

In [17]: root = ET.fromstring(x)

In [18]: ns = {"wrm":"http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader"} 
In [19]: root.find('wrm:DATA/wrm:PROTECTINFO/wrm:KEYLEN', ns).text
Out[19]: '16'

edited Jun 21, 2016 at 11:54

answered Jun 21, 2016 at 11:48

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Padraic Cunningham Over a year ago

No worries, if you are doing a lot of work with xml in python you might find lxml useful lxml.de

Ebrahim Jakoet · Accepted Answer · 2016-06-21 12:50:58Z

1

I'm wondering if this will also work. Please post your comments on pros and cons of this approach.

from xml.dom.minidom import parse
import xml.dom.minidom

# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("xmlquestion.xml")
tn = DOMTree.documentElement
print tn.namespaceURI
#print tn.childNodes

data = tn.getElementsByTagName('DATA')[0]
protectinfo = data.getElementsByTagName('PROTECTINFO')[0]
keylen = protectinfo.getElementsByTagName('KEYLEN')[0]
print keylen.childNodes[0].data

http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader
16

answered Jun 21, 2016 at 12:50

Ebrahim Jakoet

4173 silver badges13 bronze badges

1 Comment

stack user Over a year ago

That's great. I had to slightly modify to import parseString as my data source came from a network request. I was just looking for a quick way to validate the xml content. I wanted to go with ET, since it appears to be more widely used, although I found this problem frustrating, since documentation seemed lacking and it seemed like such a fundamental issue.

Collectives™ on Stack Overflow

parsing xml using python / elementree

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related