0

I‘m trying to extract the xml (http://py4e-data.dr-chuck.net/comments_42.xml) which looks like:

<note>
<comments>
<comment>
<name>Romina</name>
<count>97</count>
</comment>
...

I need to count the number of tags and sum up the value in the tags, finally print them out.

I have tried to extract and parse the xml based on the sample code given but I also made some changes.

Please see my code:

import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
import ssl

api_key = False

if api_key is False:
api_key = 42
serviceurl = 'http://py4e-data.dr-chuck.net/xml?'

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

while True:
address = input('Enter location: ')
if len(address) < 1: break

url = serviceurl + urllib.parse.urlencode(address)
uh = urllib.request.urlopen(url, context=ctx)

data = uh.read()
print('Retrieved', len(data), 'characters')
tree = ET.fromstring(data)

count = 0
sum = 0
lst = tree.findall('comments/comment')
for item in lst:
    value = int(item.find('count'.text))
    count = count+1
    sum = sum + value
    print('Count:',count)
    print('Sum:',sum)

I expect to get the count and sum of values, but the terminal said the "serviceurl" is invalid.

2
  • Can you give me a sample input for 'Enter location: '? Also, I think you meant to indent the two statements in your "while True:" Commented Jun 21, 2019 at 14:17
  • Oh yes thanks for the heads-up! I forgot to give you the link and the expected result. Now it's solved :) Commented Jun 22, 2019 at 9:21

2 Answers 2

1

I modified your code and achieved your goal of summing the values and delivering the count. I'm not sure if this is the right answer, though, because I can't tell if you're inheriting the 'enter location', or 'api_key' from sample code or if it's something you're trying to specifically accomplish.

Also, I assume you meant to use 'sum' instead of 'value' in your for loop, and store an increasing sum.

import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
import ssl

api_key = False

if api_key is False:
        api_key = 42
        serviceurl = 'http://py4e-data.dr-chuck.net/comments_42.xml'

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

#while True:
#       address = input('Enter location: ')
#       if len(address) < 1: break

url = serviceurl #+ urllib.parse.urlencode(address)
uh = urllib.request.urlopen(url, context=ctx)

data = uh.read()
print('Retrieved', len(data), 'characters')
tree = ET.fromstring(data)

count = 0
sum = 0
lst = tree.findall('comments/comment')
for item in lst:
    sum = sum + int(item.find('count').text)
    count = count+1

print("Sum: ", sum, "Count: ", count)

I achieved the output:

Retrieved 4189 characters
Sum:  2553 Count:  50

I commented out some portions of your code to make it work -- are there other constraints that prohibit directly reading the data?

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much! I solved it and the issue was about "serviceurl" indeed. I should've make an "input('Enter --') for "serviceurl" and link the “url" to it. But if possible, may I ask what's the function of the "urlib.parse.urlencode(address)" after it?
That's used to encode strings for rendering in URLs. The '%20' and '+' you see in long URLs is a result of two different ways to do encoding, because you can't send plaintext spaces in URLs. There's additional reference material showing how urllib encodes parameters to build strings.
0

Try this instead, I inputted the sample link http://py4e-data.dr-chuck.net/comments_42.xml and yielded the desired result of 2553.

import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
import ssl

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

address = input('Enter location: ')

print('Retrieving', address)
uh = urllib.request.urlopen(address, context=ctx)

data = uh.read()
tree = ET.fromstring(data)

results = tree.findall('comments/comment')
print('Comment count:', len(results))
x=[]
for item in results: 
    x.append(int(item.find('count').text))
print(x)
print(sum(x))
   

I have removed the below lines of codes and it worked. I hypothesize that it's because the serviceurl is invalid. Indeed in my codes above the address worked without the serviceurl, so it would be logical to conclude that the serviceurl is at least unnecessary.

if api_key is False:
    api_key = 42
    serviceurl = 'http://py4e-data.dr-chuck.net/xml?'

and

url = serviceurl + urllib.parse.urlencode(address)
uh = urllib.request.urlopen(url, context=ctx)

1 Comment

Hello there, please fix indentation with second codeblock

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.