9

*Note: lxml will not run on my system. I was hoping to find a solution that does not involve lxml.

I have gone through some of the documentation around here already, and am having difficulties getting this to work how I would like to. I would like to parse some XML file that looks like this:

<dict>
    <key>1375</key>
    <dict>
        <key>Key 1</key><integer>1375</integer>
        <key>Key 2</key><string>Some String</string>
        <key>Key 3</key><string>Another string</string>
        <key>Key 4</key><string>Yet another string</string>
        <key>Key 5</key><string>Strings anyone?</string>
    </dict>
</dict>

In the file I am trying to manipulate, there are more 'dict' that follow this one. I would like to read through the XML and output a text/dat file that would look like this:

1375, "Some String", "Another String", "Yet another string", "Strings anyone?"

...

Eof

** Originally, I tried to use lxml, but after many tries to get it working on my system, I moved on to using DOM. More recently, I tried using Etree to do this task. Please, for the love of all that is good, would somebody help me along with this? I am relatively new to Python and would like to learn how this works. I thank you in advance.

2
  • 1
    What OS and version of Python? Commented Oct 29, 2011 at 16:10
  • You have the number 1375 twice. Can this be two different numbers? If so, which do you want? Commented Oct 29, 2011 at 21:33

2 Answers 2

10

You can use xml.etree.ElementTree which is included with Python. There is an included companion C-implemented (i.e. much faster) xml.etree.cElementTree. lxml.etree offers a superset of the functionality but it's not needed for what you want to do.

The code provided by @Acorn works identically for me (Python 2.7, Windows 7) with each of the following imports:

import xml.etree.ElementTree as et
import xml.etree.cElementTree as et
import lxml.etree as et
...
tree = et.fromstring(xmltext)
...

What OS are you using and what installation problems have you had with lxml?

Sign up to request clarification or add additional context in comments.

4 Comments

I'm using Ubuntu Maverick Meerkat Netbook installation...the latest lxml installation attempt included this message in my terminal: Unpacking python-lxml (from .../python-lxml_2.2.6-1_i386.deb) ... Setting up firmware-b43-installer (4.150.10.5-4) ... Not supported low-power chip with PCI id 14e4:4315! Aborting.
I just tried the new imports with the code and got this error: Traceback (most recent call last): File "/home/worky.py", line 5, in <module> import lxml.etree as et ImportError: No module named lxml.etree
(1) About your Ubuntu installation problem: I suggest that you try the lxml mailing list. (2) "No module named lxml.etree" ... that's because it's not installed. Have only one import active at a time; comment out the other two.
ok, John, that kind of helps, I'm messing around with the code now... I might be able to swing it with this code, although... it's not exactly what I need... if I can get it to work, it IS what I need I guess. Thanks for the tips.
7
import xml.etree.ElementTree as et
import csv

xmltext = """
<dicts>
    <key>1375</key>
    <dict>
        <key>Key 1</key><integer>1375</integer>
        <key>Key 2</key><string>Some String</string>
        <key>Key 3</key><string>Another string</string>
        <key>Key 4</key><string>Yet another string</string>
        <key>Key 5</key><string>Strings anyone?</string>
    </dict>
</dicts>
"""

f = open('output.txt', 'w')

writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)

tree = et.fromstring(xmltext)

# iterate over the dict elements
for dict_el in tree.iterfind('dict'):
    data = []
    # get the text contents of each non-key element
    for el in dict_el:
        if el.tag == 'string':
            data.append(el.text)
        # if it's an integer element convert to int so csv wont quote it
        elif el.tag == 'integer':
            data.append(int(el.text))
    writer.writerow(data)

17 Comments

Thanks for posting so soon. The problem is, I cannot get lxml to run on my machine. I have python 2.7 and have made several attempts to get that module installed, but have failed. I was hoping there was another way that doesn't involve lxml.
What OS are you running?
I'm running Ubuntu Maverick Meerkat Netbook edition...
How are you trying to install it? have you tried installing it with PIP?
Ok, I am installing pip now, I will try to figure out how to use it to install it. BTW, it's snowing in New York... wth?! and thanks for the help.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.