Pretty printing XML in Python

Question

What is the best way (or are the various ways) to pretty print XML in Python?

Benjamin Loison · Accepted Answer · 2024-01-27 16:48:10Z

482

import xml.dom.minidom

dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()

edited Jan 27, 2024 at 16:48

Benjamin Loison

5,7514 gold badges20 silver badges37 bronze badges

answered Jul 30, 2009 at 14:12

Ben Noland

35.1k18 gold badges54 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

19 Comments

Todd Hopkinson Over a year ago

This will get you pretty xml, but note that what comes out in the text node is actually different than what came in - there are new whitespaces on text nodes. This may cause you trouble if you are expecting EXACTLY what fed in to feed out.

vaab Over a year ago

@icnivad: while it is important to point that fact, it seems strange to me that somebody would want to prettify its XML if spaces were of some importance for them !

Anton I. Sipos Over a year ago

Nice! Can collapse this to a one liner: python -c 'import sys;import xml.dom.minidom;s=sys.stdin.read();print xml.dom.minidom.parseString(s).toprettyxml()'

bukzor Over a year ago

minidom is widely panned as a pretty bad xml implementation. If you allow yourself to add external depenencies, lxml is far superior.

Danny Staple Over a year ago

Not a fan of redefining xml there from being a module to the output object, but the method otherwise works. I'd love to find a nicer way to go from the core etree to pretty printing. While lxml is cool, there are times when I'd prefer to keep to the core if I can.

|

Benjamin Loison · Accepted Answer · 2024-01-27 16:48:38Z

198

lxml is recent, updated, and includes a pretty print function

import lxml.etree as etree

x = etree.parse("filename")
print etree.tostring(x, pretty_print=True)

Check out the lxml tutorial: https://lxml.de/tutorial.html

edited Jan 27, 2024 at 16:48

Benjamin Loison

5,7514 gold badges20 silver badges37 bronze badges

answered Apr 15, 2009 at 0:21

1729

5,0814 gold badges30 silver badges17 bronze badges

10 Comments

intuited Over a year ago

Only downside to lxml is a dependency on external libraries. This I think is not so bad under Windows the libraries are packaged with the module. Under linux they are an aptitude install away. Under OS/X I'm not sure.

pkoch Over a year ago

On OS X you just need a functioning gcc and easy_install/pip.

vaab Over a year ago

lxml pretty printer isn't reliable and won't pretty print your XML properly in lots of cases explained in lxml FAQ. I quit using lxml for pretty printing after several corner cases that just don't work (ie this won't fix: Bug #910018). All these problem is related to uses of XML values containing spaces that should be preserved.

Thor Over a year ago

Since in Python 3 you usually want to work with str (=unicode string in Python 2), better use this: print(etree.tostring(x, pretty_print=True, encoding="unicode")). Writing to an output file is possible in just one line, no intermediary variable needed: etree.parse("filename").write("outputfile", encoding="utf-8")

oak Over a year ago

etree.XMLParser(remove_blank_text=True) sometime can help to do the right printing

|

Jimmy.Rustle · Accepted Answer · 2016-01-07 15:50:55Z

125

Another solution is to borrow this indent function, for use with the ElementTree library that's built in to Python since 2.5. Here's what that would look like:

from xml.etree import ElementTree

def indent(elem, level=0):
    i = "\n" + level*"  "
    j = "\n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem        

root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)

edited Jan 7, 2016 at 15:50

Jimmy.Rustle

237 bronze badges

answered Jan 4, 2011 at 1:57

ade

4,0841 gold badge23 silver badges26 bronze badges

7 Comments

Stefano Over a year ago

...and then just use lxml tostring!

Bouke Over a year ago

Note that you can still do tree.write([filename]) for writing to file (tree being the ElementTree instance).

shinzou Over a year ago

No you can't since elementtree.getroot() doesn't have that method, only an elementtree object has it. @bouke

e-malito Over a year ago

Here's how you can write to a file: tree = ElementTree.parse('file) ; root = tree.getroot() ; indent(root); tree.write('Out.xml');

Ciro Santilli OurBigBook.com Over a year ago

@AylwynLake can anyone provide a minimal example where this code goes wrong, and the eff one goes right? Then we can start testing and put the best code here.

|

ChaimG · Accepted Answer · 2024-04-04 13:43:23Z

111

You have a few options.

If you are using Python 3.9+, your simplest option is:

xml.etree.ElementTree.indent()

Batteries included and pretty output.

Sample code:

import xml.etree.ElementTree as ET

element = ET.XML("<html><body>text</body></html>")
ET.indent(element)
print(ET.tostring(element, encoding='unicode'))

BeautifulSoup.prettify()

BeautifulSoup may be the simplest solution for Python < 3.9.

from bs4 import BeautifulSoup

bs = BeautifulSoup(open(xml_file), 'xml')
pretty_xml = bs.prettify()
print(pretty_xml)

Output:

<?xml version="1.0" encoding="utf-8"?>
<issues>
 <issue>
  <id>
   1
  </id>
  <title>
   Add Visual Studio 2005 and 2008 solution files
  </title>
 </issue>
</issues>

This is my goto answer. The default arguments work as is. But text contents are spread out on separate lines as if they were nested elements.

lxml.etree.parse()

Prettier output but with arguments.

from lxml import etree

x = etree.parse(FILE_NAME)
pretty_xml = etree.tostring(x, pretty_print=True, encoding=str)

Produces:

  <issues>
    <issue>
      <id>1</id>
      <title>Add Visual Studio 2005 and 2008 solution files</title>
      <details>We need Visual Studio 2005/2008 project files for Windows.</details>
    </issue>
  </issues>

This works for me with no issues.

xml.dom.minidom.parse()

No external dependencies but post-processing.

import xml.dom.minidom as md

dom = md.parse(FILE_NAME)     
# To parse string instead use: dom = md.parseString(xml_string)
pretty_xml = dom.toprettyxml()
# remove the weird newline issue:
pretty_xml = os.linesep.join([s for s in pretty_xml.splitlines()
                              if s.strip()])

The output is the same as above, but it's more code.

edited Apr 4, 2024 at 13:43

answered Sep 14, 2016 at 4:54

ChaimG

7,5925 gold badges41 silver badges50 bronze badges

6 Comments

hadoop Over a year ago

Getting this error message:

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: xml. Do you need to install a parser library?

reynoldsnlp Over a year ago

You need to run python3 -m pip install --user lxml

Milovan Tomašević Over a year ago

Good job man :) for remove the weird newline issue ! ty

555Russich Over a year ago

first solution in this answer should go above in all answers

Jean-Francois T. Over a year ago

the ET.indent is the solution that really works well on recent versions of Python

|

Nick Bolton · Accepted Answer · 2010-07-30 17:43:28Z

49

Here's my (hacky?) solution to get around the ugly text node problem.

uglyXml = doc.toprettyxml(indent='  ')

text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)    
prettyXml = text_re.sub('>\g<1></', uglyXml)

print prettyXml

The above code will produce:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>1</id>
    <title>Add Visual Studio 2005 and 2008 solution files</title>
    <details>We need Visual Studio 2005/2008 project files for Windows.</details>
  </issue>
</issues>

Instead of this:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>
      1
    </id>
    <title>
      Add Visual Studio 2005 and 2008 solution files
    </title>
    <details>
      We need Visual Studio 2005/2008 project files for Windows.
    </details>
  </issue>
</issues>

Disclaimer: There are probably some limitations.

edited Jul 30, 2010 at 17:43

answered Jul 29, 2010 at 22:18

Nick Bolton

40k73 gold badges185 silver badges248 bronze badges

7 Comments

iano Over a year ago

Thank you! This was my one gripe with all the pretty printing methods. Works well with the few files I tried.

heltonbiker Over a year ago

I found a pretty 'almost identical' solution, but yours is more direct, using re.compile prior to sub operation (I was using re.findall() twice, zip and a for loop with str.replace()...)

Marius Gedminas Over a year ago

This is no longer necessary in Python 2.7: xml.dom.minidom's toprettyxml() now produces output like '<id>1</id>' by default, for nodes that have exactly one text child node.

Mike Finch Over a year ago

I am compelled to use Python 2.6. So, this regex reformatting trick is very useful. Worked as-is with no problems.

posfan12 Over a year ago

@Marius Gedminas I am running 2.7.2 and the "default" is definitely not as you say.

|

mzjn · Accepted Answer · 2021-05-19 17:38:04Z

30

As of Python 3.9, ElementTree has an indent() function for pretty-printing XML trees.

See https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.indent.

Sample usage:

import xml.etree.ElementTree as ET

element = ET.XML("<html><body>text</body></html>")
ET.indent(element)
print(ET.tostring(element, encoding='unicode'))

The upside is that it does not require any additional libraries. For more information check https://bugs.python.org/issue14465 and https://github.com/python/cpython/pull/15200

edited May 19, 2021 at 17:38

mzjn

51.5k16 gold badges139 silver badges265 bronze badges

answered Aug 12, 2020 at 9:26

o15a3d4l11s2

4,0693 gold badges33 silver badges43 bronze badges

Comments

roskakori · Accepted Answer · 2012-01-06 15:36:52Z

23

As others pointed out, lxml has a pretty printer built in.

Be aware though that by default it changes CDATA sections to normal text, which can have nasty results.

Here's a Python function that preserves the input file and only changes the indentation (notice the strip_cdata=False). Furthermore it makes sure the output uses UTF-8 as encoding instead of the default ASCII (notice the encoding='utf-8'):

from lxml import etree

def prettyPrintXml(xmlFilePathToPrettyPrint):
    assert xmlFilePathToPrettyPrint is not None
    parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
    document = etree.parse(xmlFilePathToPrettyPrint, parser)
    document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')

Example usage:

prettyPrintXml('some_folder/some_file.xml')

edited Jan 6, 2012 at 15:36

answered Apr 13, 2011 at 12:33

roskakori

3,4761 gold badge36 silver badges32 bronze badges

1 Comment

elwc Over a year ago

It's a little late now. But I think lxml fixed CDATA? CDATA is CDATA on my side.

Russell Silva · Accepted Answer · 2012-04-12 23:40:33Z

14

If you have xmllint you can spawn a subprocess and use it. xmllint --format <file> pretty-prints its input XML to standard output.

Note that this method uses an program external to python, which makes it sort of a hack.

def pretty_print_xml(xml):
    proc = subprocess.Popen(
        ['xmllint', '--format', '/dev/stdin'],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
    )
    (output, error_output) = proc.communicate(xml);
    return output

print(pretty_print_xml(data))

answered Apr 12, 2012 at 23:40

Russell Silva

2,8423 gold badges28 silver badges37 bronze badges

Comments

Benjamin Loison · Accepted Answer · 2024-01-27 16:49:16Z

I tried to edit "ade"s answer above, but Stack Overflow wouldn't let me edit after I had initially provided feedback anonymously. This is a less buggy version of the function to pretty-print an ElementTree.

def indent(elem, level=0, more_sibs=False):
    i = "\n"
    if level:
        i += (level-1) * '  '
    num_kids = len(elem)
    if num_kids:
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
            if level:
                elem.text += '  '
        count = 0
        for kid in elem:
            indent(kid, level+1, count < num_kids - 1)
            count += 1
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
            if more_sibs:
                elem.tail += '  '
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
            if more_sibs:
                elem.tail += '  '

bobince · Accepted Answer · 2009-04-15 00:48:22Z

10

If you're using a DOM implementation, each has their own form of pretty-printing built-in:

# minidom
#
document.toprettyxml()

# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)

# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)

If you're using something else without its own pretty-printer — or those pretty-printers don't quite do it the way you want — you'd probably have to write or subclass your own serialiser.

answered Apr 15, 2009 at 0:48

bobince

538k111 gold badges675 silver badges846 bronze badges

Comments

giltay · Accepted Answer · 2009-04-15 13:46:01Z

8

I had some problems with minidom's pretty print. I'd get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1'). Here's my workaround for it:

def toprettyxml(doc, encoding):
    """Return a pretty-printed XML document in a given encoding."""
    unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
                          u'<?xml version="1.0" encoding="%s"?>' % encoding)
    return unistr.encode(encoding, 'xmlcharrefreplace')

answered Apr 15, 2009 at 13:46

giltay

2,0251 gold badge12 silver badges5 bronze badges

Comments

John Smith Optional · Accepted Answer · 2014-05-13 14:49:14Z

6

from yattag import indent

pretty_string = indent(ugly_string)

It won't add spaces or newlines inside text nodes, unless you ask for it with:

indent(mystring, indent_text = True)

You can specify what the indentation unit should be and what the newline should look like.

pretty_xml_string = indent(
    ugly_xml_string,
    indentation = '    ',
    newline = '\r\n'
)

The doc is on http://www.yattag.org homepage.

answered May 13, 2014 at 14:49

John Smith Optional

25.4k14 gold badges47 silver badges67 bronze badges

Comments

nacitar sevaht · Accepted Answer · 2016-07-25 17:25:06Z

6

I wrote a solution to walk through an existing ElementTree and use text/tail to indent it as one typically expects.

def prettify(element, indent='  '):
    queue = [(0, element)]  # (level, element)
    while queue:
        level, element = queue.pop(0)
        children = [(level + 1, child) for child in list(element)]
        if children:
            element.text = '\n' + indent * (level+1)  # for child open
        if queue:
            element.tail = '\n' + indent * queue[0][0]  # for sibling open
        else:
            element.tail = '\n' + indent * (level-1)  # for parent close
        queue[0:0] = children  # prepend so children come before siblings

answered Jul 25, 2016 at 17:25

nacitar sevaht

5186 silver badges15 bronze badges

Comments

Josh Correia · Accepted Answer · 2020-02-12 17:03:19Z

Here's a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.

import xml.etree.ElementTree as ET
import xml.dom.minidom
import os

def pretty_print_xml_given_root(root, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

def pretty_print_xml_given_file(input_xml, output_xml):
    """
    Useful for when you want to reformat an already existing xml file
    """
    tree = ET.parse(input_xml)
    root = tree.getroot()
    pretty_print_xml_given_root(root, output_xml)

I found how to fix the common newline issue here.

Vitaly Zdanevich · Accepted Answer · 2016-09-07 17:02:38Z

4

You can use popular external library xmltodict, with unparse and pretty=True you will get best result:

xmltodict.unparse(
    xmltodict.parse(my_xml), full_document=False, pretty=True)

full_document=False against <?xml version="1.0" encoding="UTF-8"?> at the top.

answered Sep 7, 2016 at 17:02

Vitaly Zdanevich

15.4k11 gold badges56 silver badges89 bronze badges

Comments

Dan Lew · Accepted Answer · 2009-04-15 00:07:19Z

3

XML pretty print for python looks pretty good for this task. (Appropriately named, too.)

An alternative is to use pyXML, which has a PrettyPrint function.

answered Apr 15, 2009 at 0:07

Dan Lew

87.7k33 gold badges184 silver badges177 bronze badges

2 Comments

8bitjunkie Over a year ago

HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/xmlpp/ Think that project is in the attic nowadays, shame.

mzjn Over a year ago

Obsolete answer, dead links. PyXML was abandoned years ago.

emehex · Accepted Answer · 2020-10-01 20:01:23Z

3

I found this question while looking for "how to pretty print html"

Using some of the ideas in this thread I adapted the XML solutions to work for XML or HTML:

from xml.dom.minidom import parseString as string_to_dom

def prettify(string, html=True):
    dom = string_to_dom(string)
    ugly = dom.toprettyxml(indent="  ")
    split = list(filter(lambda x: len(x.strip()), ugly.split('\n')))
    if html:
        split = split[1:]
    pretty = '\n'.join(split)
    return pretty

def pretty_print(html):
    print(prettify(html))

When used this is what it looks like:

html = """\
<div class="foo" id="bar"><p>'IDK!'</p><br/><div class='baz'><div>
<span>Hi</span></div></div><p id='blarg'>Try for 2</p>
<div class='baz'>Oh No!</div></div>
"""

pretty_print(html)

Which returns:

<div class="foo" id="bar">
  <p>'IDK!'</p>
  <br/>
  <div class="baz">
    <div>
      <span>Hi</span>
    </div>
  </div>
  <p id="blarg">Try for 2</p>
  <div class="baz">Oh No!</div>
</div>

answered Oct 1, 2020 at 20:01

emehex

10.7k11 gold badges61 silver badges111 bronze badges

1 Comment

mherzog Over a year ago

Works in Python 3.8, which doesn't support indent used in other answers.

nmvega · Accepted Answer · 2020-01-20 20:12:52Z

2

You can try this variation...

Install BeautifulSoup and the backend lxml (parser) libraries:

user$ pip3 install lxml bs4

Process your XML document:

from bs4 import BeautifulSoup

with open('/path/to/file.xml', 'r') as doc: 
    for line in doc: 
        print(BeautifulSoup(line, 'lxml-xml').prettify())

edited Jan 20, 2020 at 20:12

answered Sep 29, 2019 at 20:49

nmvega

5,7599 gold badges63 silver badges71 bronze badges

4 Comments

user2357112 Over a year ago

'lxml' uses lxml's HTML parser - see the BS4 docs. You need 'xml' or 'lxml-xml' for the XML parser.

nmvega Over a year ago

This comment keeps getting deleted. Again, I've enter a formal complaint (in addition to) 4-flags) of post tampering with StackOverflow, and will not stop until this is forensically investigated by a security team (access logs and version histories). The above timestamp is wrong (by years) and likely the content, too.

Umar.H Over a year ago

This worked fine for me, unsure of the down vote from the docs lxml’s XML parser BeautifulSoup(markup, "lxml-xml") BeautifulSoup(markup, "xml")

nmvega Over a year ago

@Datanovice I'm glad it helped you. :) As for the suspect downvote, someone tampered with my original answer (which correctly originally specified lxml-xml), and then they proceeded to downvote it that same day. I submitted an official complaint to S/O but they refused to investigate. Anyway, I have since "de-tampered" my answer, which is now correct again (and specifies lxml-xml as it originally did). Thank you.

Benjamin Loison · Accepted Answer · 2024-01-27 16:49:56Z

2

Take a look at the vkbeautify module.

It is a python version of my very popular javascript/nodejs plugin with the same name. It can pretty-print/minify XML, JSON and CSS text. Input and output can be string/file in any combinations. It is very compact and doesn't have any dependency.

Examples:

import vkbeautify as vkb

vkb.xml(text)                       
vkb.xml(text, 'path/to/dest/file')  
vkb.xml('path/to/src/file')        
vkb.xml('path/to/src/file', 'path/to/dest/file')

edited Jan 27, 2024 at 16:49

Benjamin Loison

5,7514 gold badges20 silver badges37 bronze badges

answered Jan 4, 2017 at 1:50

vadimk

1,47115 silver badges11 bronze badges

1 Comment

Cameron Lowell Palmer Over a year ago

This particular library handles the Ugly Text Node problem.

gaborous · Accepted Answer · 2017-07-25 14:38:25Z

1

An alternative if you don't want to have to reparse, there is the xmlpp.py library with the get_pprint() function. It worked nice and smoothly for my use cases, without having to reparse to an lxml ElementTree object.

answered Jul 25, 2017 at 14:38

gaborous

16.8k11 gold badges91 silver badges104 bronze badges

3 Comments

david-hoze Over a year ago

Tried minidom and lxml and didn't get a properly formatted and indented xml. This worked as expected

Endre Both Over a year ago

Fails for tag names that are prefixed by a namespace and contain a hyphen (e.g. <ns:hyphenated-tag/>; the part starting with the hyphen is simply dropped, giving e.g. <ns:hyphenated/>.

gaborous Over a year ago

@EndreBoth Nice catch, I did not test, but maybe it would be easy to fix this in the xmlpp.py code?

Wuagliono · Accepted Answer · 2022-06-07 10:23:05Z

1

I found a fast and easy way to nicely format and print an xml file:

import xml.etree.ElementTree as ET

xmlTree = ET.parse('your XML file')
xmlRoot = xmlTree.getroot()
xmlDoc =  ET.tostring(xmlRoot, encoding="unicode")

print(xmlDoc)

Outuput:

<root>
  <child>
    <subchild>.....</subchild>
  </child>
  <child>
    <subchild>.....</subchild>
  </child>
  ...
  ...
  ...
  <child>
    <subchild>.....</subchild>
  </child>
</root>

edited Jun 7, 2022 at 10:23

answered May 17, 2021 at 13:20

Wuagliono

4024 silver badges15 bronze badges

Comments

Zelphir Kaltstahl · Accepted Answer · 2015-07-27 23:06:29Z

I had this problem and solved it like this:

def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
    pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
    if pretty_print: pretty_printed_xml = pretty_printed_xml.replace('  ', indent)
    file.write(pretty_printed_xml)

In my code this method is called like this:

try:
    with open(file_path, 'w') as file:
        file.write('<?xml version="1.0" encoding="utf-8" ?>')

        # create some xml content using etree ...

        xml_parser = XMLParser()
        xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')

except IOError:
    print("Error while writing in log file!")

This works only because etree by default uses two spaces to indent, which I don't find very much emphasizing the indentation and therefore not pretty. I couldn't ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.

Reed_Xia · Accepted Answer · 2019-08-12 09:24:45Z

0

from lxml import etree
import xml.dom.minidom as mmd

xml_root = etree.parse(xml_fiel_path, etree.XMLParser())

def print_xml(xml_root):
    plain_xml = etree.tostring(xml_root).decode('utf-8')
    urgly_xml = ''.join(plain_xml .split())
    good_xml = mmd.parseString(urgly_xml)
    print(good_xml.toprettyxml(indent='    ',))

It's working well for the xml with Chinese!

answered Aug 12, 2019 at 9:24

Reed_Xia

1,4723 gold badges19 silver badges30 bronze badges

Comments

FriskySaga · Accepted Answer · 2020-05-14 04:16:29Z

0

If for some reason you can't get your hands on any of the Python modules that other users mentioned, I suggest the following solution for Python 2.7:

import subprocess

def makePretty(filepath):
  cmd = "xmllint --format " + filepath
  prettyXML = subprocess.check_output(cmd, shell = True)
  with open(filepath, "w") as outfile:
    outfile.write(prettyXML)

As far as I know, this solution will work on Unix-based systems that have the xmllint package installed.

answered May 14, 2020 at 4:16

FriskySaga

4381 gold badge8 silver badges22 bronze badges

2 Comments

mzjn Over a year ago

xmllint has already been suggested in another answer: stackoverflow.com/a/10133365/407651

FriskySaga Over a year ago

@mzjn I saw the answer, but I simplified mine down to check_output because you don't need to do error checking

Benjamin Loison · Accepted Answer · 2024-01-27 16:50:27Z

0

For converting an entire xml document to a pretty xml document
(ex: assuming you've extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly "content.xml" file to a pretty one for automated git version control and git difftooling of .odt/.ods files, such as I'm implementing here)

import xml.dom.minidom

file = open("./content.xml", 'r')
xml_string = file.read()
file.close()

parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()

file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()

References:

Thanks to Ben Noland's answer on this page which got me most of the way there.

edited Jan 27, 2024 at 16:50

Benjamin Loison

5,7514 gold badges20 silver badges37 bronze badges

answered Sep 1, 2018 at 6:48

Gabriel Staples

56.1k34 gold badges299 silver badges394 bronze badges

Comments

Benjamin Loison · Accepted Answer · 2024-01-27 16:51:15Z

Use etree.indent and etree.tostring

import lxml.etree as etree

root = etree.fromstring('<html><head></head><body><h1>Welcome</h1></body></html>')
etree.indent(root, space="  ")
xml_string = etree.tostring(root, pretty_print=True).decode()
print(xml_string)

output

<html>
  <head/>
  <body>
    <h1>Welcome</h1>
  </body>
</html>

Removing namespaces and prefixes

import lxml.etree as etree


def dump_xml(element):
    for item in element.getiterator():
        item.tag = etree.QName(item).localname

    etree.cleanup_namespaces(element)
    etree.indent(element, space="  ")
    result = etree.tostring(element, pretty_print=True).decode()
    return result


root = etree.fromstring('<cs:document xmlns:cs="http://blabla.com"><name>hello world</name></cs:document>')
xml_string = dump_xml(root)
print(xml_string)

output

<document>
  <name>hello world</name>
</document>

sun · Accepted Answer · 2024-06-12 14:02:44Z

0

Use the built-in xml library to pretty-print a given XML file to stdout:

$ echo '<foo><bar>baz</bar><beer /><bank><account>000123</account></bank></foo>' > filename.xml

$ python -c 'import sys, xml.dom.minidom as xml; print(xml.parse(sys.argv[1]).toprettyxml(encoding="utf-8"))' filename.xml

<?xml version="1.0" encoding="utf-8"?>
<foo>
    <bar>baz</bar>
    <beer/>
    <bank>
        <account>000123</account>
    </bank>
</foo>

Works sufficiently well with Python 2.7.18.

answered Jun 12, 2024 at 14:02

sun

1,1071 gold badge10 silver badges14 bronze badges

Comments

Yarma22 · Accepted Answer · 2025-11-11 11:46:33Z

As much as I hate to reinvent the wheel, I stumbled upon an annoying use-case and the built-in indenting solutions wouldn't work in my case. Basically, I have XML files that contains a ton of children tags inside a text tag. This makes it pretty difficult to visually parse the file once it's indented.

<main>
    <p>A lot <span>of text</span> between <span>tags</span> and <span>at some point</span> it becomes <span>too long</span> to fit on a <span>single line</span> which makes it <span>harder</span> to read</p>
    <p>A lot <span>of text</span> between <span>tags</span> with some <span>wor</span>ds that are cut in the mid<span>dle</span> and <span>at some point</span> it becomes <span>too long</span> to fit on a <span>single</span> line</p>
</main>

Like ade, Joshua Richardson, nacitar sevaht and the likes, I resolved to write my own function. It takes an xml.etree.ElementTree.Element as an input:

def indent(node, level=0, space='    ', last_child=True):
    if len(node) > 0:
        ind = '\n' + space * (level + 1)
        node.text = node.text.strip() + ind if node.text else ind

        for i, child in enumerate(node):
            indent(child, level + 1, space, i == len(node) - 1)

    ind = '\n' + space * (level - (1 if last_child else 0))
    node.tail = node.tail.strip() + ind if node.tail else ind

Sample usage:

import xml.etree.ElementTree as ET

tree = ET.parse('input.xml')
indent(tree.getroot())
tree.write('output.xml')

This results in the following output:

<main>
    <p>A lot
        <span>of text</span>between
        <span>tags</span>and
        <span>at some point</span>it becomes
        <span>too long</span>to fit on a
        <span>single line</span>which makes it
        <span>harder</span>to read
    </p>
</main>

What if I need to preserve whitespaces?

The above function works well to "repair" broken indentation in an XML file.

However, in my use-case, the input file is always compacted (e.g. all the content is on one line) and I actually need to preserve whitespaces, both because a word can be split by a tag and because the output XML files needs to be re-compacted later on. For example:

<main><p>A lot <span>of text</span> between <span>tags</span> with some <span>wor</span>ds that are cut in the mid<span>dle</span> and <span>at some point</span> it becomes <span>too long</span> to fit on a <span>single</span> line</p></main>

So I created a variant of that function without the call to strip():

def indent(node, level=0, space='    ', last_child=True):
    if len(node) > 0:
        ind = '\n' + space * (level + 1)
        node.text = node.text.strip() + ind if node.text else ind

        for i, child in enumerate(node):
            indent(child, level + 1, space, i == len(node) - 1)

    ind = '\n' + space * (level - (1 if last_child else 0))
    node.tail = node.tail.strip() + ind if node.tail else ind

This results in the following output (note the spaces after some span tags and at the end of some lines):

<main>
    <p>A lot 
        <span>of text</span> between 
        <span>tags</span> with some 
        <span>wor</span>ds that are cut in the mid
        <span>dle</span> and 
        <span>at some point</span> it becomes 
        <span>too long</span> to fit on a 
        <span>single</span> line
    </p>
</main>

Benjamin Loison · Accepted Answer · 2024-01-27 16:51:40Z

I solved this with some lines of code, opening the file, going trough it and adding indentation, then saving it again. I was working with small xml files, and did not want to add dependencies, or more libraries to install for the user. Anyway, here is what I ended up with:

f = open(file_name,'r')
xml = f.read()
f.close()

#Removing old indendations
raw_xml = ''        
for line in xml:
    raw_xml += line

xml = raw_xml

new_xml = ''
indent = '    '
deepness = 0

for i in range((len(xml))):

    new_xml += xml[i]   
    if(i<len(xml)-3):

        simpleSplit = xml[i:(i+2)] == '><'
        advancSplit = xml[i:(i+3)] == '></'        
        end = xml[i:(i+2)] == '/>'    
        start = xml[i] == '<'

        if(advancSplit):
            deepness += -1
            new_xml += '\n' + indent*deepness
            simpleSplit = False
            deepness += -1
        if(simpleSplit):
            new_xml += '\n' + indent*deepness
        if(start):
            deepness += 1
        if(end):
            deepness += -1

f = open(file_name,'w')
f.write(new_xml)
f.close()

It works for me, perhaps someone will have some use of it :)

Collectives™ on Stack Overflow

Pretty printing XML in Python

29 Answers 29

19 Comments

10 Comments

7 Comments

xml.etree.ElementTree.indent()

BeautifulSoup.prettify()

lxml.etree.parse()

xml.dom.minidom.parse()

6 Comments

7 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

1 Comment

4 Comments

1 Comment

3 Comments

Comments

Comments

Comments

2 Comments

Comments

Comments

Comments

What if I need to preserve whitespaces?

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

29 Answers 29

19 Comments

10 Comments

7 Comments

6 Comments

7 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

1 Comment

4 Comments

1 Comment

3 Comments

Comments

Comments

Comments

2 Comments

Comments

Comments

Comments

What if I need to preserve whitespaces?

Comments

Comments

Linked

Related