7

I use the xml library in Python3.5 for reading and writing an xml-file. I don't modify the file. Just open and write. But the library modifes the file.

  1. Why is it modified?
  2. How can I prevent this? e.g. I just want to replace specific tag or it's value in a quite complex xml-file without loosing any other informations.

This is the example file

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<movie>
    <title>Der Eisbär</title>
    <ids>
        <entry>
            <key>tmdb</key>
            <value xsi:type="xs:int" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">9321</value>
        </entry>
        <entry>
            <key>imdb</key>
            <value xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">tt0167132</value>
        </entry>
    </ids>
</movie>

This is the code

import xml.etree.ElementTree as ET
tree = ET.parse('x.nfo')
tree.write('y.nfo', encoding='utf-8')

And the xml-file becomes this

<movie xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <title>Der Eisbär</title>
    <ids>
        <entry>
            <key>tmdb</key>
            <value xsi:type="xs:int">9321</value>
        </entry>
        <entry>
            <key>imdb</key>
            <value xsi:type="xs:string">tt0167132</value>
        </entry>
    </ids>
</movie>
  • Line 1 is gone.
  • The <movie>-tag in line 2 has attributes now.
  • The <value>-tag in line 7 and 11 now has less attributes.
5
  • 1
    In general, the short names (and where they are specified) for XML namespaces cannot be expected to be stable. But why aren't you using lxml anyway? Commented Sep 1, 2017 at 4:02
  • 1
    lxml manages to preserve the namespaces by default, although you still have to pass a flag to get the XML declaration up top. Commented Sep 1, 2017 at 4:06
  • @o11c You mean a python package lxml? I didn't notice it. I was just using xml as search term in the python doc and found ElementTree. Commented Sep 1, 2017 at 7:04
  • @o11c lxml doesn't help, too. It does some transformations of the code, too. Commented Sep 1, 2017 at 7:52
  • Well, nobody tries to preserve attribute order. So what lxml does is sort them. So even if they are changed on the first write, they will be consistent on all subsequent writes. Commented Sep 1, 2017 at 15:38

1 Answer 1

6

Note that "xml package" and "the xml library" are ambiguous. There are several XML-related modules in the standard library: https://docs.python.org/3/library/xml.html.

Why is it modified?

ElementTree moves namespace declarations to the root element and declarations for namespaces that aren't actually used in the document are removed.

Why does ElementTree do this? I don't know, but perhaps it is a way to make the implementation simpler.

How can I prevent this? e.g. I just want to replace specific tag or it's value in a quite complex xml-file without loosing any other informations.

I don't think there is a way to prevent this. The issue has been brought up before. Here are two very similar questions:

My suggestion is to use lxml instead of ElementTree. With lxml, the namespace declarations will remain where they occur in the original file.

Line 1 is gone.

That line is the XML declaration. It is recommended but not mandatory to have one.

If you always want an XML declaration, use xml_declaration=True in the write() method call.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.