1

The xml I need to search specifies but does not use a namespace:

    <WRMHEADER xmlns="http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader" version="4.0.0.0">
    <DATA>
        <PROTECTINFO>
            <KEYLEN>16</KEYLEN>
            <ALGID>AESCTR</ALGID>
        </PROTECTINFO>

        <LA_URL>http://192.168.8.33/license/rightsmanager.asmx</LA_URL>
        <LUI_URL>http://192.168.8.33/license/rightsmanager.asmx</LUI_URL>

        <DS_ID></DS_ID>
        <KID></KID>
        <CHECKSUM></CHECKSUM>

    </DATA>
</WRMHEADER>

I'd like to read the values for various fields, e.g. data/protectinfo/keylen etc.

root    = ET.fromstring(sMyXml)
keylen  = root.findall('./DATA/PROTECTINFO/KEYLEN')

print root
print keylen

This code prints the following:

<Element {http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader}WRMHEADER at 0x7f2a7c35be60>
[]

root.find and root.findall return None or [] for this query. I've been unable to specify a default namespace, is there a solution to querying these values? thanks

2 Answers 2

1

Create a namespace dict:

x = """<WRMHEADER xmlns="http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader" version="4.0.0.0">
    <DATA>
        <PROTECTINFO>
            <KEYLEN>16</KEYLEN>
            <ALGID>AESCTR</ALGID>
        </PROTECTINFO>

        <LA_URL>http://192.168.8.33/license/rightsmanager.asmx</LA_URL>
        <LUI_URL>http://192.168.8.33/license/rightsmanager.asmx</LUI_URL>

        <DS_ID></DS_ID>
        <KID></KID>
        <CHECKSUM></CHECKSUM>

    </DATA>
</WRMHEADER>"""
from xml.etree import ElementTree as ET

root = ET.fromstring(x)
ns = {"wrm":"http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader"}
keylen = root.findall('wrm:DATA', ns)

print root
print keylen

Now you should get something like:

<Element '{http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader}WRMHEADER' at 0x7fd0a30d45d0>
[<Element '{http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader}DATA' at 0x7fd0a30d4610>]

To get /DATA/PROTECTINFO/KEYLEN:

In [17]: root = ET.fromstring(x)

In [18]: ns = {"wrm":"http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader"} 
In [19]: root.find('wrm:DATA/wrm:PROTECTINFO/wrm:KEYLEN', ns).text
Out[19]: '16'
Sign up to request clarification or add additional context in comments.

1 Comment

No worries, if you are doing a lot of work with xml in python you might find lxml useful lxml.de
1

I'm wondering if this will also work. Please post your comments on pros and cons of this approach.

from xml.dom.minidom import parse
import xml.dom.minidom

# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("xmlquestion.xml")
tn = DOMTree.documentElement
print tn.namespaceURI
#print tn.childNodes

data = tn.getElementsByTagName('DATA')[0]
protectinfo = data.getElementsByTagName('PROTECTINFO')[0]
keylen = protectinfo.getElementsByTagName('KEYLEN')[0]
print keylen.childNodes[0].data

http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader
16

1 Comment

That's great. I had to slightly modify to import parseString as my data source came from a network request. I was just looking for a quick way to validate the xml content. I wanted to go with ET, since it appears to be more widely used, although I found this problem frustrating, since documentation seemed lacking and it seemed like such a fundamental issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.