0

I'm new to xml and REST but have some basic knowledge with python. I'm facing some issues while trying to parse the attached xml file.

I use Beautifulsoup library to parse the file and, for an unknown reason, I can access different fields of entries 2 and 3 but not entry 1, while they are all formatted the same way. Can someone tell what I'm doing wrong with my (attached) code and output please?

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title type="text">News</title>
    <id>1</id>
    <link href="" />
    <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/entries" rel="self" />
    <updated>2014-11-26T10:41:12.424Z</updated>
    <author />
    <entry xmlns:georss="http://www.georss.org/georss">
        <title type="html">TEST REST</title>
        <content type="html">1</content>
        <author>
            <name>User213</name>
        </author>
        <summary type="html">Test PUT Entry 3</summary>
        <id>7</id>
        <georss:point>21.94420760726878 17.44</georss:point>
        <updated>2014-11-24T09:55:31.000Z</updated>
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7" rel="self" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7/editEntry" rel="edit" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7/comments" rel="replies" type="application/atom+xml" length="0" />
    </entry>
    <entry xmlns:georss="http://www.georss.org/georss">
        <title type="html">TEST REST</title>
        <content type="html">1</content>
        <author>
            <name>User213</name>
        </author>
        <summary type="html">Test PUT Entry 8</summary>
        <id>8</id>
        <georss:point>21.94420760726878 17.44</georss:point>
        <updated>2014-11-24T13:47:09.000Z</updated>
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8" rel="self" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8/editEntry" rel="edit" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8/comments" rel="replies" type="application/atom+xml" length="0" />
    </entry>
    <entry xmlns:georss="http://www.georss.org/georss">
        <title type="html">TEST REST</title>
        <content type="html">1</content>
        <author>
            <name>User213</name>
        </author>
        <summary type="html">Test POST</summary>
        <id>12</id>
        <georss:point>21.94420760726878 17.44</georss:point>
        <updated>2014-11-25T14:29:02.000Z</updated>
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12" rel="self" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12/editEntry" rel="edit" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12/comments" rel="replies" type="application/atom+xml" length="0" />
    </entry>
</feed>

Python code:

#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
handler = open("/tmp/test.xml").read()

results = soup.findAll('entry')
for r in results:
    print r
    print r.find('title').text
    print r.find('content').text
    print r.find('georss:point')
    print r.find('id')
    print r.find('updated')

And the output is the following:

<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
</entry>
TEST REST
1
None
None
None
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test PUT Entry 8</summary>
<id>8</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-24T13:47:09.000Z</updated>
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
TEST REST
1
<georss:point>21.94420760726878 17.44</georss:point>
<id>8</id>
<updated>2014-11-24T13:47:09.000Z</updated>
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test POST</summary>
<id>12</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-25T14:29:02.000Z</updated>
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
TEST REST
1
<georss:point>21.94420760726878 17.44</georss:point>
<id>12</id>
<updated>2014-11-25T14:29:02.000Z</updated>
0

1 Answer 1

1

From what I have tested with the following code :

#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
handler = open("./test.xml").read()

soup = BeautifulSoup(handler)
print soup.prettify()

The ouput is like that :

<?xml version='1.0' encoding='utf-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
 <title type="text">
  News
 </title>
 <id>
  1
 </id>
 <link href="" />
 <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/entries" rel="self" />
 <updated>
  2014-11-26T10:41:12.424Z
 </updated>
 <author>
  <entry xmlns:georss="http://www.georss.org/georss">
   <title type="html">
    TEST REST
   </title>
   <content type="html">
    1
   </content>
  </entry>
 </author>
 <author>
  <name>
   User213
  </name>
 </author>

If you look closely you will see that in your xml the <author /> is seen as an open tag by BeautifulSoup.

That's why you he don't find title, content.. because for him they are out of the tag.

Hope this`ll help

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot. That seems to be the issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.