2

I currently need to parse an XML document in python. However, I am struggling with the python libraries and this rather complex xml.

I have looked at the method used here: python read complex xml with ElementTree but it does not seem to work ?

I am using Python 2.7.7

The XML is taken from http://nvd.nist.gov/download.cfm#CURRENTXML and for instance one entry that I needs to parse looks like this: http://pastebin.com/qdPN98VX

My relevant code looks likes this at the moment. I can successfully read the ID of the first entry, however, everything within the elment is not accessable. I am also not sure whether the ElementTree is the best option for a 50mb file ? :

from vulnsdb.models import Vuln as CVE


file = 'CVE/20140630-NVDCE-2.0-2014.xml'

tree = ET.parse(file)
root = tree.getroot()

for entry in root:
    c = CVE()
    c.name = entry.attrib['id']
    for details in entry:
        if details.find("{http://scap.nist.gov/schema/vulnerability/0.4}cve-id"):
            print details.find("{http://scap.nist.gov/schema/vulnerability/0.4}cve-id").text
    break

1 Answer 1

2

You can use xml.etree.ElementTree.iterparse() that parses the tree incrementally:

import xml.etree.ElementTree as ET


TAG = '{http://scap.nist.gov/schema/feed/vulnerability/2.0}entry'
ID = "CVE-2014-0001"

tree = ET.iterparse(open('CVE/20140630-NVDCE-2.0-2014.xml'))
for event, element in tree:
    if event == 'end' and element.tag == TAG and element.attrib.get('id') == ID:
        print ET.tostring(element)
        break
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.