0

I'm using ElementTree to parse some XML retrieved from a website, but somehow I can't see to be able to use ".find" or ".findall". I tried to use ElementTree, and I tired lxml.etree and nothing is working with me. My goal is to retrieve //course from my XML file retrieved from a URL.

import requests
import xml.etree.ElementTree as ET
res = requests.get(COURSES_URL).text #Storing the XML into res
XML = ET.fromstring(res)
print(XML.findall('//COURSE'))

COURSES_URL is my own URL which I am retrieving the XML from, and yes it is working since I got the output XML that I want (sample):

<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated by Oracle Reports version 11.1.2.1.0 -->
<SYRSPOS_REP>
  <LIST_G_PROGRAM>
    <G_PROGRAM>
      <SPRIDEN_ID>U712214</SPRIDEN_ID>
      <STUDENT_NAME>Mark Adam Johns</STUDENT_NAME>
      <SMBPOGN_PIDM>98</SMBPOGN_PIDM>
      <SMBPOGN_REQUEST_NO>46</SMBPOGN_REQUEST_NO>
      <COURSE ID=1411001>PASS</COURSE>
      <COURSE ID=1411023>PASS</COURSE>
      <COURSE ID=1411136>PASS</COURSE>
    </G_PROGRAM>
  </LIST_G_PROGRAM>
</SYRSPOS_REP>
7
  • 1
    Please post a minimal but complete sample of the XML i.e make sure it includes the target element <course> and make sure the entire XML is well-formed (single root element, closing tag, etc.) Commented Feb 18, 2018 at 3:55
  • In addition to the previous comment, and taking into account that all tags of your small sample are upper case: did you try "//COURSE"? Commented Feb 18, 2018 at 16:01
  • @har07 I have updated the XML to include the minimal XML. Commented Feb 18, 2018 at 18:29
  • 1
    Could it be that you are not paying attention to XML namespaces? Commented Feb 18, 2018 at 18:48
  • 1
    @thethiny After adding double quotes on the XML attributes and a . at the beginning of the XPath, it worked: eval.in/958573 . As tomalak suggests, probably there are XML namespaces in the actual XML ? Commented Feb 19, 2018 at 3:34

1 Answer 1

2

Solved: Apparently I had 2 issues. First of all I can't use findall in print since it returns a list, I had to do a for in loop for i in XML.findall(), then I print i.text(). Secondly, I had to add a dot after the quotation mark, as in ".//COURSES"

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.