0

I have an external xml file that I am loading in my views.py file

def test(request):

    url = urllib2.urlopen("http://someurl.com?xml")
    dom = minidom.parse(url)

    groups = dom.getElementsByTagName("group")

    deal_holder = []

    #    Iterate over each DOM group element:
    for group in groups:
        # Iterate over each child node
        for groupChild in group.childNodes:
            deal_holder.append(groupChild)

    return render_to_response('folder/test.html', {'deal_holder':deal_holder})

This is what the loaded XML file looks like:

<page>
    <site>
        <siteid>25550</siteid>
        <sitename>
            <![CDATA[ Some Text Here ]]>
        </sitename>
        <sitelink>
            http://somelinkehere.com
        </sitelink>
        <timezone>
            <![CDATA[ Pacific Time ]]>
        </timezone>
    </site>
    <groups>
        <enablefeaturedgroup>OFF</enablefeaturedgroup>
        <group>
            <groupid>467246</groupid>
            <groupname>
                <![CDATA[ Today's Deal ]]>
            </groupname>
            <groupdescription>
                <![CDATA[ ]]>
            </groupdescription>
            </group>
            <group>
            <groupid>467247</groupid>
            <groupname>
                <![CDATA[ Past Deals ]]>
            </groupname>
            <groupdescription>
                <![CDATA[ ]]>
            </groupdescription>
        </group>
    </groups>
</page>

The problem is that all of the examples I've seen use something like what I'm using except that they usually have XML tags that look like this: <weather:forecast day="Wed" date="14 Sep 2011" low="56" high="72" text="AM Clouds/PM Sun" code="30"/> and are able to retrieve the information from stuff like the day="Wed", date="14 Sep 2011", low="56" etc... but the info I want to retrieve is actually between the tags such as <siteid>25550</siteid>

Any advice or info would be greatly appreciated.

2 Answers 2

2

Using minidom is quite similar to javascript.

from xml.dom import minidom
from StringIO import StringIO
a = """<page>
    <site>
        <siteid>25550</siteid>
        <sitename>
            <![CDATA[ Some Text Here ]]>
        </sitename>
        <sitelink>
            http://somelinkehere.com
        </sitelink>
        <timezone>
            <![CDATA[ Pacific Time ]]>
        </timezone>
    </site>
    <groups>
        <enablefeaturedgroup>OFF</enablefeaturedgroup>
        <group>
            <groupid>467246</groupid>
            <groupname>
                <![CDATA[ Today's Deal ]]>
            </groupname>
            <groupdescription>
                <![CDATA[ ]]>
            </groupdescription>
            </group>
            <group>
            <groupid>467247</groupid>
            <groupname>
                <![CDATA[ Past Deals ]]>
            </groupname>
            <groupdescription>
                <![CDATA[ ]]>
            </groupdescription>
        </group>
    </groups>
</page>
"""
tree = minidom.parse(StringIO(a))
groups = tree.getElementsByTagName("group")

Using StringIO is not required if you are using urllib, because the minidom's parse method expects a file-like object (urllib.urlopen returns just that).

I'd advise against passing this list to the django templating system. You should parse it further.

#    Iterate over each DOM group element:
group_dictionaries = []
for group in groups:
    group_dict = {}
    # Iterate over each child node
    # instead of for loop maybe print groupChildNodes[0] for groupid
    # print groupChildNodes[1] for groupname
    for groupChild in group.ChildNodes:
        # do something with each node
        group_dict[groupChild.tagName] = groupChild.data
    group_dictionaries.append(group_dict)

  Now in the template:
  {% for group in group_dictionaries %}
      {{ group.groupid }}
      {{ group.groupname }}
      etc.
  {% endfor %}

You could save them values in a list of dictionaries.

Sign up to request clarification or add additional context in comments.

4 Comments

Because I'm using django I did this data = dom.getElementsByTagName("group") and then passed the data variable to the template and in the template I do {{ data }} which outputs 1 - [<DOM Element: group at 0x967b5cc>, <DOM Element: group at 0x9539f8c>]. How am I able to retrieve any data from that such as groupid or groupname Thanks!
I updated my code above to reflect some changed I made per your sugguestions. The problem though is that in my template file I add {{ deal_holder }} and it outputs [<DOM Text node " ">, <DOM Element: groupid at 0x99cfd8c>, <DOM Text node " ">, <DOM Element: groupname at 0x8b220ec>, <DOM Text node " ">, <DOM Element: groupdescription at 0x992f1cc>, <DOM Text node " ">, <DOM Text node " ">, <DOM Element: groupid at 0x9a0d34c>, etc...] so I'm still unable to just grab the info. I'm pretty new to python/django so I may be missing something obvious. Thanks for your help.
see my edit. That is happening because your deal_holder variable contains a list and that is the representation of a list that gets printed.
I'm now getting the error of Exception Value: Text instance has no attribute 'tagName'
1

With lxml you could do something like this:

import lxml.etree

tree = lxml.etree.parse("http://someurl.com")
sites = tree.xpath("//site")

for site in sites:
    siteid = site.find("siteid").text
    print siteid

3 Comments

Is there anything else you're trying to do that my example doesn't cover?
I'm unable to get lxml.etree to import and I'm not sure if I can install it on my server. Also, would you mind removing the actual URL from your example? I must have included it by mistake. Thnks!
Ah okay, deleted. If you manage to get lxml installed on your server, feel free to ask any questions you have.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.