XML Python: XML code is duplicated after saving to file

Question

I have a code that in principle is to open the file content and wrap it with an additional import tag:

with open('oferta-empik.xml', 'r+', encoding='utf-8') as f:
  xml = '<import>' + f.read() + '</import>'
  print(xml)
  f.write(xml)
  f.close()

Unfortunately, after saving half the code is unchanged, and then the xml code already wrapped in the import is inserted into the file.

In total, the file duplicates the xml code where the first original is unchanged and then the same is appended to the end of the file wrapped with the import tag

ORIGINAL CODE:

<offers>
  <offer>
    <leadtime-to-ship>1</leadtime-to-ship>
    <product-id-type>EAN</product-id-type>
    <state>11</state>
    <quantity>0</quantity>
    <price>146</price>
    <sku>B01.001.1.10</sku>
  </offer>
</offer>

AFTER CODE:

<offers>
  <offer>
    <leadtime-to-ship>1</leadtime-to-ship>
    <product-id-type>EAN</product-id-type>
    <state>11</state>
    <quantity>0</quantity>
    <price>146</price>
    <sku>B01.001.1.10</sku>
  </offer>
</offer>
<import><offers>
  <offer>
    <leadtime-to-ship>1</leadtime-to-ship>
    <product-id-type>EAN</product-id-type>
    <state>11</state>
    <quantity>0</quantity>
    <price>146</price>
    <sku>B01.001.1.10</sku>
  </offer>
</offer></import>

Roy2012 · Accepted Answer · 2020-06-22 11:28:47Z

2

the issue is that you're appending the new text (the new XML) to the end of the file. You're reading the entire file, and then write the modified XML at the end of that file.

There are two solutions:

Recommended: open the file for reading. Read the XML. Close it, and then open it for writing and write the entire thing (override the initial content).
Not Recommended: After you read, seek to the beginning of the file (with f.seek(0)) and write the new content. This solution is not recommended because if, at some point, the new content is shorter than the original content, the result will be inconsistent / messed-up.

edited Jun 22, 2020 at 11:28

answered Jun 22, 2020 at 11:20

Roy2012

12.7k3 gold badges28 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Tomalak Over a year ago

Not recommended in general: Treating XML as text files. This answer works for text files but it is not a particularly good suggestion for XML.

Roy2012 Over a year ago

True (very much). But, for some simple cases, a 'messy' solution is a reasonable approach.

Tomalak Over a year ago

It already falls apart when the source XML file contains an XML declaration (or is in a different encoding than the one used in the open() call). It's never a reasonable approach to treat XML as text.

Tomalak · Accepted Answer · 2020-06-22 15:02:09Z

I have a code that in principle is to open the file content and wrap it with an additional import tag

Your current approach is wrong. Don't open XML files as text files, don't treat XML as text. Always use a parser.

This is a lot better:

import xml.etree.ElementTree as ET

# 1: load current document and top level element
old_tree = ET.parse('oferta-empik.xml')
old_root = old_tree.getroot()

# 2: create <import> element to serve as new top level
new_root = ET.Element('import')

# 3: insert current document root ("wrap it in <import>")
new_root.insert(0, old_root)

# 4 make new ElementTree and write it to file 
new_tree = ET.ElementTree(new_root)
with open('output.xml', 'wb') as f:
    new_tree.write(f, encoding='utf8')

Compressed:

new_root = ET.Element('import')
new_root.insert(0, ET.parse('oferta-empik.xml').getroot())
with open('output.xml', 'wb') as f:
    ET.ElementTree(new_root).write(f, encoding='utf8')

Collectives™ on Stack Overflow

XML Python: XML code is duplicated after saving to file

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related