Python , XML Index error

Question

Hello I am having trouble with a xml file I am using. Now what happens is on a short xml file the program works fine but for some reason once it reaches a size ( I am thinking 1 MB) it gives me a "IndexError: list index out of range"

Here is the code I am writing so far.

from xml.dom import minidom

import smtplib
from email.mime.text import MIMEText
from datetime import datetime

def xml_data():
    f = open('C:\opidea_2.xml', 'r')
    data = f.read()
    f.close()

    dom = minidom.parseString(data)
    ic = (dom.getElementsByTagName('logentry'))
    dom = None      
    content = ''  

    for num in ic:
        name = num.getElementsByTagName('author')[0].firstChild.nodeValue
        if name:
            content += "***Changes by:"  + str(name) + "*** " +  '\n\n     Date: '
        else:
            content += "***Changes are made Anonymously *** " +  '\n\n     Date: '
        print content

if __name__ == "__main__":
    xml_data ()

Here is part of the xml if it helps.

 <log>
 <logentry
  revision="33185">
 <author>glv</author>
 <date>2012-08-06T21:01:52.494219Z</date>
 <paths>

 <path
  kind="file"
  action="M">/branches/Patch_4_2_0_Branch/text.xml</path>   

 <path
  kind="dir"
  action="M">/branches/Patch_4_2_0_Branch</path>

</paths>
<msg>PATCH_BRANCH:N/A
 BUG_NUMBER:N/A
 FEATURE_AFFECTED:N/A
 OVERVIEW:N/A
  Adding the SVN log size requirement to the branch 
 </msg>
  </logentry>
    </log>

The actual xml file is much bigger but this is the general format. It will actually work if it was this small but once it gets bigger I get problems.

here is the traceback

Traceback (most recent call last):
  File "C:\python\src\SVN_Email_copy.py", line 141, in <module>
    xml_data ()
  File "C:\python\src\SVN_Email_copy.py", line 50, in xml_data
    name = num.getElementsByTagName('author')[0].firstChild.nodeValue
IndexError: list index out of range

Sadly I dont know where full traceback is or what it is nor do I understand what pdb means. — Gilbert V
– Gilbert V, Commented Aug 29, 2012 at 15:14
When you get an error, you don't just get "Index out of range", you get line numbers and call sites printed for the entire execution stack. This is called a "traceback." Include all of that. (You don't even say what line number produces the error.) — Francis Avila
– Francis Avila, Commented Aug 29, 2012 at 15:17
Take the Python debugger to figure out where the problem occurs. The traceback tells you clearly that the 'num' node has no element <author>. So likely your input data is somewhere not consistent or you are expecting an author tag although it is perhaps optional...fix the data, fix your assumptions or fix your code and make more flexible by dealing with such situations. — user2665694
– user2665694, Commented Aug 29, 2012 at 15:21
check for the length of getElementsByTagName() result...obviously... — user2665694
– user2665694, Commented Aug 29, 2012 at 15:31

g.d.d.c · Accepted Answer · 2012-08-29 15:31:15Z

1

Based on the code provided your error is going to be in this line:

name = num.getElementsByTagName('author')[0].firstChild.nodeValue
#xml node-^
#function call -------------------------^
#list indexing ----------------------------^
#attribute access -------------------------------------^

That's the only place in the demonstrated code that you're indexing into a list. That would imply that in your larger XML Sample you're missing an <author> tag. You'll have to correct that, or add in some level of error handling / data validation.

Please see the code elaboration for more explanation. You're doing a ton of things in a single line by taking advantage of the return behaviors of successive commands. So, the num is defined, that's fine. Then you call a function (method). It returns a list. You attempt to retrieve from that list and it throws an exception, so you never make it to the Attribute Access to get to firstChild, which definitely means you get no nodeValue.

Error checking may look something like this:

authors = num.getElementsByTagName('author')
if len(authors) > 0:
  name = authors[0].firstChild.nodeValue

Though there are many, many ways you could achieve that.

edited Aug 29, 2012 at 15:31

answered Aug 29, 2012 at 15:18

g.d.d.c

48.3k12 gold badges105 silver badges116 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Gilbert V Over a year ago

So I would have to use some way of checking it I thought the if statement would help but your saying I have to use something else?

Francis Avila Over a year ago

num.getElementsByTagName('author') == [], [][0] will be an IndexError. So no, if statement is too late. Try authors = num.getElementsByTagName('author'); if authors: author = authors[0].firstChild.nodeValue. Also, consider using xml.etree instead.

Gilbert V Over a year ago

I see thank you so i can check the lenght of it and if it is greater then 0 it will print it out rather then just get it outright to begin with. thank you.

Collectives™ on Stack Overflow

Python , XML Index error

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related