0

Hello I am having trouble with a xml file I am using. Now what happens is on a short xml file the program works fine but for some reason once it reaches a size ( I am thinking 1 MB) it gives me a "IndexError: list index out of range"

Here is the code I am writing so far.

from xml.dom import minidom

import smtplib
from email.mime.text import MIMEText
from datetime import datetime

def xml_data():
    f = open('C:\opidea_2.xml', 'r')
    data = f.read()
    f.close()

    dom = minidom.parseString(data)
    ic = (dom.getElementsByTagName('logentry'))
    dom = None      
    content = ''  

    for num in ic:
        name = num.getElementsByTagName('author')[0].firstChild.nodeValue
        if name:
            content += "***Changes by:"  + str(name) + "*** " +  '\n\n     Date: '
        else:
            content += "***Changes are made Anonymously *** " +  '\n\n     Date: '
        print content

if __name__ == "__main__":
    xml_data ()

Here is part of the xml if it helps.

 <log>
 <logentry
  revision="33185">
 <author>glv</author>
 <date>2012-08-06T21:01:52.494219Z</date>
 <paths>

 <path
  kind="file"
  action="M">/branches/Patch_4_2_0_Branch/text.xml</path>   

 <path
  kind="dir"
  action="M">/branches/Patch_4_2_0_Branch</path>

</paths>
<msg>PATCH_BRANCH:N/A
 BUG_NUMBER:N/A
 FEATURE_AFFECTED:N/A
 OVERVIEW:N/A
  Adding the SVN log size requirement to the branch 
 </msg>
  </logentry>
    </log>

The actual xml file is much bigger but this is the general format. It will actually work if it was this small but once it gets bigger I get problems.

here is the traceback

Traceback (most recent call last):
  File "C:\python\src\SVN_Email_copy.py", line 141, in <module>
    xml_data ()
  File "C:\python\src\SVN_Email_copy.py", line 50, in xml_data
    name = num.getElementsByTagName('author')[0].firstChild.nodeValue
IndexError: list index out of range
7
  • 1
    Where is the full traceback? Did you take pdb? Commented Aug 29, 2012 at 15:13
  • Sadly I dont know where full traceback is or what it is nor do I understand what pdb means. Commented Aug 29, 2012 at 15:14
  • 1
    When you get an error, you don't just get "Index out of range", you get line numbers and call sites printed for the entire execution stack. This is called a "traceback." Include all of that. (You don't even say what line number produces the error.) Commented Aug 29, 2012 at 15:17
  • 1
    Take the Python debugger to figure out where the problem occurs. The traceback tells you clearly that the 'num' node has no element <author>. So likely your input data is somewhere not consistent or you are expecting an author tag although it is perhaps optional...fix the data, fix your assumptions or fix your code and make more flexible by dealing with such situations. Commented Aug 29, 2012 at 15:21
  • 1
    check for the length of getElementsByTagName() result...obviously... Commented Aug 29, 2012 at 15:31

1 Answer 1

1

Based on the code provided your error is going to be in this line:

name = num.getElementsByTagName('author')[0].firstChild.nodeValue
#xml node-^
#function call -------------------------^
#list indexing ----------------------------^
#attribute access -------------------------------------^

That's the only place in the demonstrated code that you're indexing into a list. That would imply that in your larger XML Sample you're missing an <author> tag. You'll have to correct that, or add in some level of error handling / data validation.

Please see the code elaboration for more explanation. You're doing a ton of things in a single line by taking advantage of the return behaviors of successive commands. So, the num is defined, that's fine. Then you call a function (method). It returns a list. You attempt to retrieve from that list and it throws an exception, so you never make it to the Attribute Access to get to firstChild, which definitely means you get no nodeValue.

Error checking may look something like this:

authors = num.getElementsByTagName('author')
if len(authors) > 0:
  name = authors[0].firstChild.nodeValue

Though there are many, many ways you could achieve that.

Sign up to request clarification or add additional context in comments.

3 Comments

So I would have to use some way of checking it I thought the if statement would help but your saying I have to use something else?
num.getElementsByTagName('author') == [], [][0] will be an IndexError. So no, if statement is too late. Try authors = num.getElementsByTagName('author'); if authors: author = authors[0].firstChild.nodeValue. Also, consider using xml.etree instead.
I see thank you so i can check the lenght of it and if it is greater then 0 it will print it out rather then just get it outright to begin with. thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.