How do I pull data from XML document from between XML tags in Django/Python?

Question

I have an external xml file that I am loading in my views.py file

def test(request):

    url = urllib2.urlopen("http://someurl.com?xml")
    dom = minidom.parse(url)

    groups = dom.getElementsByTagName("group")

    deal_holder = []

    #    Iterate over each DOM group element:
    for group in groups:
        # Iterate over each child node
        for groupChild in group.childNodes:
            deal_holder.append(groupChild)

    return render_to_response('folder/test.html', {'deal_holder':deal_holder})

This is what the loaded XML file looks like:

<page>
    <site>
        <siteid>25550</siteid>
        <sitename>
            <![CDATA[ Some Text Here ]]>
        </sitename>
        <sitelink>
            http://somelinkehere.com
        </sitelink>
        <timezone>
            <![CDATA[ Pacific Time ]]>
        </timezone>
    </site>
    <groups>
        <enablefeaturedgroup>OFF</enablefeaturedgroup>
        <group>
            <groupid>467246</groupid>
            <groupname>
                <![CDATA[ Today's Deal ]]>
            </groupname>
            <groupdescription>
                <![CDATA[ ]]>
            </groupdescription>
            </group>
            <group>
            <groupid>467247</groupid>
            <groupname>
                <![CDATA[ Past Deals ]]>
            </groupname>
            <groupdescription>
                <![CDATA[ ]]>
            </groupdescription>
        </group>
    </groups>
</page>

The problem is that all of the examples I've seen use something like what I'm using except that they usually have XML tags that look like this: <weather:forecast day="Wed" date="14 Sep 2011" low="56" high="72" text="AM Clouds/PM Sun" code="30"/> and are able to retrieve the information from stuff like the day="Wed", date="14 Sep 2011", low="56" etc... but the info I want to retrieve is actually between the tags such as <siteid>25550</siteid>

Any advice or info would be greatly appreciated.

Uku Loskit · Accepted Answer · 2011-09-14 19:07:32Z

2

Using minidom is quite similar to javascript.

from xml.dom import minidom
from StringIO import StringIO
a = """<page>
    <site>
        <siteid>25550</siteid>
        <sitename>
            <![CDATA[ Some Text Here ]]>
        </sitename>
        <sitelink>
            http://somelinkehere.com
        </sitelink>
        <timezone>
            <![CDATA[ Pacific Time ]]>
        </timezone>
    </site>
    <groups>
        <enablefeaturedgroup>OFF</enablefeaturedgroup>
        <group>
            <groupid>467246</groupid>
            <groupname>
                <![CDATA[ Today's Deal ]]>
            </groupname>
            <groupdescription>
                <![CDATA[ ]]>
            </groupdescription>
            </group>
            <group>
            <groupid>467247</groupid>
            <groupname>
                <![CDATA[ Past Deals ]]>
            </groupname>
            <groupdescription>
                <![CDATA[ ]]>
            </groupdescription>
        </group>
    </groups>
</page>
"""
tree = minidom.parse(StringIO(a))
groups = tree.getElementsByTagName("group")

Using StringIO is not required if you are using urllib, because the minidom's parse method expects a file-like object (urllib.urlopen returns just that).

I'd advise against passing this list to the django templating system. You should parse it further.

#    Iterate over each DOM group element:
group_dictionaries = []
for group in groups:
    group_dict = {}
    # Iterate over each child node
    # instead of for loop maybe print groupChildNodes[0] for groupid
    # print groupChildNodes[1] for groupname
    for groupChild in group.ChildNodes:
        # do something with each node
        group_dict[groupChild.tagName] = groupChild.data
    group_dictionaries.append(group_dict)

  Now in the template:
  {% for group in group_dictionaries %}
      {{ group.groupid }}
      {{ group.groupname }}
      etc.
  {% endfor %}

You could save them values in a list of dictionaries.

edited Sep 14, 2011 at 19:07

answered Sep 14, 2011 at 17:39

Uku Loskit

42.2k9 gold badges97 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

bigmike7801 Over a year ago

Because I'm using django I did this data = dom.getElementsByTagName("group") and then passed the data variable to the template and in the template I do {{ data }} which outputs 1 - [<DOM Element: group at 0x967b5cc>, <DOM Element: group at 0x9539f8c>]. How am I able to retrieve any data from that such as groupid or groupname Thanks!

bigmike7801 Over a year ago

I updated my code above to reflect some changed I made per your sugguestions. The problem though is that in my template file I add {{ deal_holder }} and it outputs

[<DOM Text node " ">, <DOM Element: groupid at 0x99cfd8c>, <DOM Text node " ">, <DOM Element: groupname at 0x8b220ec>, <DOM Text node " ">, <DOM Element: groupdescription at 0x992f1cc>, <DOM Text node " ">, <DOM Text node " ">, <DOM Element: groupid at 0x9a0d34c>, etc...]

so I'm still unable to just grab the info. I'm pretty new to python/django so I may be missing something obvious. Thanks for your help.

Uku Loskit Over a year ago

see my edit. That is happening because your deal_holder variable contains a list and that is the representation of a list that gets printed.

bigmike7801 Over a year ago

I'm now getting the error of Exception Value: Text instance has no attribute 'tagName'

Acorn · Accepted Answer · 2011-09-14 22:46:40Z

1

With lxml you could do something like this:

import lxml.etree

tree = lxml.etree.parse("http://someurl.com")
sites = tree.xpath("//site")

for site in sites:
    siteid = site.find("siteid").text
    print siteid

edited Sep 14, 2011 at 22:46

answered Sep 14, 2011 at 17:29

Acorn

50.8k30 gold badges143 silver badges180 bronze badges

3 Comments

Acorn Over a year ago

Is there anything else you're trying to do that my example doesn't cover?

bigmike7801 Over a year ago

I'm unable to get lxml.etree to import and I'm not sure if I can install it on my server. Also, would you mind removing the actual URL from your example? I must have included it by mistake. Thnks!

Acorn Over a year ago

Ah okay, deleted. If you manage to get lxml installed on your server, feel free to ask any questions you have.

Collectives™ on Stack Overflow

How do I pull data from XML document from between XML tags in Django/Python?

2 Answers 2

4 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related