6

I have an XML file in the following format

<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>1</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>

I want to change the value of bat to '2' and change the file to this:

<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>

I open this file by doing this

tree = ET.parse(filePath)
root = tree.getroot()

I then change the value of bat to '2' and save the file like this:

tree.write(filePath, "utf-8", True, None, "xml")

The value of bat successfully changes to 2, but the XML file now looks like this.

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <ns0:b>
         <ns0:c>1</ns0:c>
      </ns0:b>
   </a>
</foo>

In order to fix the issue of having a namespace named ns0, I do the following before parsing the document

ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")

This gets rid of the ns0 namepace but the xml file now looks like this

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns="urn:schemas-microsoft-com:asm.v1">
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <b>
         <c>1</c>
      </b>
   </a>
</foo>

What do I do to get the output I need?

6
  • 1
    What version of Python and lxml are you using? I'm not able to reproduce that behavior. tree.write(filePath, "utf-8", True, None, "xml") throws an error on Python 3.5 -- Try doing your arguments explicitly: tree.write("output.xml",xml_declaration=True,encoding="utf-8",pretty_print=True) Commented Jul 29, 2016 at 16:32
  • I'm using Python version 3.5.1. Not sure what lxml is - I started using Python yesterday. Commented Jul 29, 2016 at 16:34
  • I tried it and explicitly specifying arguments makes no difference. Commented Jul 29, 2016 at 16:37
  • Very similar to this question: stackoverflow.com/q/38438921/407651 Commented Jul 30, 2016 at 7:05
  • 2
    If you are OK with using a toolkit that is not in the standard library, then take a look at lxml. It is is similar to ElementTree (an extension of the same basic API) but more powerful. lxml.de Commented Aug 1, 2016 at 13:52

2 Answers 2

1

As far as i know there isn't a way by the means of xml.etree.ElementTree methods to achieve your goal. By digging in the xml.etree source code and the xml specification I found that the library behaviour is not wrong, nor unreasonable. Anyway it does not allows the output you are looking for.

To achieve your goal using that library you have to customize rendering behaviour. To best suite your needs I have written the following render function.

from xml.etree import ElementTree as ET
from re import findall, sub

def render(root, buffer='', namespaces=None, level=0, indent_size=2, encoding='utf-8'):
    buffer += f'<?xml version="1.0" encoding="{encoding}" ?>\n' if not level else ''
    root = root.getroot() if isinstance(root, ET.ElementTree) else root
    _, namespaces = ET._namespaces(root) if not level else (None, namespaces)
    for element in root.iter():
        indent = ' ' * indent_size * level
        tag = sub(r'({[^}]+}\s*)*', '', element.tag)
        buffer += f'{indent}<{tag}'
        for ns in findall(r'{[^}]+}', element.tag):
            ns_key = ns[1:-1]
            if ns_key not in namespaces: continue
            buffer += ' xmlns' + (f':{namespaces[ns_key]}' if namespaces[ns_key] != '' else '') + f'="{ns_key}"'
            del namespaces[ns_key]
        for k, v in element.attrib.items():
            buffer += f' {k}="{v}"'
        buffer += '>' + element.text.strip() if element.text else '>'
        children = list(element)
        for child in children:
            sep = '\n' if buffer[-1] != '\n' else ''
            buffer += sep + render(child, level=level+1, indent_size=indent_size, namespaces=namespaces)
        buffer += f'{indent}</{tag}>\n' if 0 != len(children) else f'</{tag}>\n'
    return buffer

By supplying to the above render() function your xml input data as follows:

data =\ 
'''<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>1</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>'''

root = ET.ElementTree(ET.fromstring(data))
ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")
print(render(root))

It prints out the output your are looking for:

<?xml version="1.0" encoding="utf-8" ?>
<foo>
  <bar>
    <bat>1</bat>
  </bar>
  <a>
    <b xmlns="urn:schemas-microsoft-com:asm.v1">
      <c>1</c>
    </b>
  </a>
</foo>
Sign up to request clarification or add additional context in comments.

2 Comments

The OP wants the value of the <bat> element to be 2 in the output.
@mzjn You're right. Anyway he is able to achieve that. The answer addresses what the OP ask help for.
1

Using package lxml can helps solve your problem. An example with original/modified xml file and python code (using lxml) package, with the namespace/xml structure unchanged, has been provided here: example with namespace/xml structure unchanged

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.