Python reading XML files keeps getting error "list index out of range"

Question

I have an XML file with the following structure:

<Thread THREAD_SEQUENCE="Q268_R16">
<RelQuestion RELQ_ID="Q268_R16">
<RelQSubject>Best Bank.</RelQSubject>
<RelQBody>Hi ti all QL's; What bank you are using? and why? Are you using this bank just because it has an affiliate at home? Regards;</RelQBody>
</RelQuestion>
</Thread>

In the XML file, there are 244 RelQBody tags. What I want to do is getting the text inside the RelQBody tag. I have tried something like this:

import xml.dom.minidom
dom = xml.dom.minidom.parse("test.xml")
data = dom.documentElement

question = data.getElementsByTagName("RelQBody")
i=1
for q in question:
    print("%i. %s" % (i, q.childNodes[0].data))
    i = i+1

But i keep getting an error saying

Traceback (most recent call last):
File "C:\Users\Administrator\Documents\python\test.py", line 13, in <module>
  print("%i. %s" % (i, q.childNodes[0].data))
IndexError: list index out of range

However, when i tried this code:

import xml.dom.minidom
dom = xml.dom.minidom.parse("test.xml")
data = dom.documentElement

question = data.getElementsByTagName("RelQBody")
i=1
for q in question:
    print("%i" % i)
    i = i+1

i got number 1-244. it is exactly the same as in the dataset.

So why there's a difference when i print out with the string and without the string? Maybe someone can tell me which part did i do wrong? I'm new to Python so any help will be appreciated. Thanks.

Vibhutha Kumarage · Accepted Answer · 2017-07-12 05:43:14Z

1

import xml.dom.minidom
dom = xml.dom.minidom.parse("test.xml")
data = dom.documentElement

question = data.getElementsByTagName("RelQBody")
for i,q in enumerate(question):
    if len(q.childNodes) > 0:
        print("%i. %s" % (i+1, q.childNodes[0].data))

answered Jul 12, 2017 at 5:43

Vibhutha Kumarage

1,41915 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Lutfi Fitroh Hadi Over a year ago

Further question. In case of empty RelQBody tags, i want to use the text inside RelQSubject as the question. I create a code like this: for i in range(len(qbody)): if len(qbody[i].childNodes) > 0: question.append(qbody[i].childNodes[0].data.lower()) else: question.append(qsubject[i].childNodes[0].data.lower()) is there any way better to achieve what i want?

Ofer Sadan · Accepted Answer · 2017-07-12 05:39:52Z

1

i'm guessing the blame is childNodes[0], because maybe one of the nodes has 0 children, and calling childNodes[0] will result in IndexError

So try this:

import xml.dom.minidom
dom = xml.dom.minidom.parse("test.xml")
data = dom.documentElement

question = data.getElementsByTagName("RelQBody")
i=1
for q in question:
    if len(q.childNodes) > 0:
        print("%i. %s" % (i, q.childNodes[0].data))
    i = i+1

answered Jul 12, 2017 at 5:39

Ofer Sadan

12k6 gold badges42 silver badges66 bronze badges

1 Comment

Lutfi Fitroh Hadi Over a year ago

i just look further down the XML, and yes.. there are 2 threads with empty RelQBody. I guess i have more work to be done in preprocessing this file lol

Collectives™ on Stack Overflow

Python reading XML files keeps getting error "list index out of range"

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related