1

I am trying to parse a XML file with python for a school project.

To see if the prasing works I printed the values of the "lista_marfuri".

It shows the following error: xml.parsers.expat.ExpatError: XML declaration not well-formed: line 1, column 35

The XML code is:

<?xml version="1.0" encoding="UTF-8 standalone="yes"?>

<fapte>
    <lista_marfuri>
        <marfa> 
            <id> 1 </id>
            <nume> grebla </nume>
            <categorie> gradinarit </gradinarit>
            <cantitate> 100 </cantitate>
            <pret> 20 </pret>
        </marfa>
        <marfa> 
            <id> 2 </id>
            <nume> sac 1kg ingrasamant </nume>
            <categorie> gradinarit </gradinarit>
            <cantitate> 300 </cantitate>
            <pret> 30 </pret>
        </marfa>
        <marfa> 
            <id> 3 </id>
            <nume> surubelnita </nume>
            <categorie> general </gradinarit>
            <cantitate> 200 </cantitate>
            <pret> 5 </pret>
        </marfa>
    </lista_marfuri>
    
    
    <lista_categorii>
        ...
    </lista_categorii>
    
    <lista_clienti>
        ...
    </lista_clienti>
    
    <lista_comenzi>
        ...
    </lista_comenzi>
    
</fapte>

And the python code is:

import xml.dom.minidom

tree = xml.dom.minidom.parse('SBC.xml')

fapte = tree.documentElement

marfuri = fapte.getElementsByTagName('marfa')

for marfa in marfuri:
    print(f"-- Marfa {marfa.getAttribute('id')} --")

    nume = marfa.getElementByTagName('nume')[0].childNodes[0].nodeValue
    categorie = marfa.getElementByTagName('categorie')[0].childNodes[0].nodeValue
    cantitate = marfa.getElementByTagName('cantitate')[0].childNodes[0].nodeValue
    pret = marfa.getElementByTagName('pret')[0].childNodes[0].nodeValue

    print(f"Nume: {nume}")
    print(f"Categorie: {categorie}")
    print(f"Cantitate: {cantitate}")
    print(f"Pret: {pret}")
5
  • 1
    Instead of encoding="UTF-8, you need encoding="UTF-8". Commented Dec 24, 2022 at 12:09
  • Also, <categorie> start tags require corresponding </categorie> end tags. Commented Dec 24, 2022 at 12:13
  • The XML file has some errors as described by mzjn. Once you fix these issues, you should have better luck. Commented Dec 24, 2022 at 13:36
  • Thank you now i passed the error, but i encountered a new one if you can help me with that as well. It says: nume = marfa.getElementByTagName('nume')[0].childNodes[0].nodeValue AttributeError: 'Element' object has no attribute 'getElementByTagName' Commented Dec 24, 2022 at 14:13
  • getElementByTagName should be getElementsByTagName. Commented Dec 24, 2022 at 20:08

2 Answers 2

2

I think working with ElementTree will make your life easier.

import xml.etree.ElementTree as ET

xml = '''<fapte>
    <lista_marfuri>
        <marfa> 
            <id> 1 </id>
            <nume> grebla </nume>
            <categorie> gradinarit </categorie>
            <cantitate> 100 </cantitate>
            <pret> 20 </pret>
        </marfa>
        <marfa> 
            <id> 2 </id>
            <nume> sac 1kg ingrasamant </nume>
            <categorie> gradinarit </categorie>
            <cantitate> 300 </cantitate>
            <pret> 30 </pret>
        </marfa>
        <marfa> 
            <id> 3 </id>
            <nume> surubelnita </nume>
            <categorie> general </categorie>
            <cantitate> 200 </cantitate>
            <pret> 5 </pret>
        </marfa>
    </lista_marfuri>
</fapte>'''

root = ET.fromstring(xml)
for marfa in root.findall('.//marfa'):
    for entry in marfa:
        print(f'{entry.tag} : {entry.text.strip()}')
    print('------------------')

output

id : 1
nume : grebla
categorie : gradinarit
cantitate : 100
pret : 20
------------------
id : 2
nume : sac 1kg ingrasamant
categorie : gradinarit
cantitate : 300
pret : 30
------------------
id : 3
nume : surubelnita
categorie : general
cantitate : 200
pret : 5
------------------
Sign up to request clarification or add additional context in comments.

Comments

0

If the xml is valid, correct closing in the first line and the closing tag of <\categories> as @mzjn noted (this shows also your Error message), than it’s the shortest to use pandas read_xml():

import pandas as pd

df = pd.read_xml('yourFileName.xml', xpath='.//marfa')
print(df)

Output:

   id                 nume   categorie  cantitate  pret
0   1               grebla  gradinarit        100    20
1   2  sac 1kg ingrasamant  gradinarit        300    30
2   3          surubelnita     general        200     5

PS: This works only, if all your interested values are on the same level in the tree.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.