Converting XML to JSON using Python?

Question

I've seen a fair share of ungainly XML->JSON code on the web, and having interacted with Stack's users for a bit, I'm convinced that this crowd can help more than the first few pages of Google results can.

So, we're parsing a weather feed, and we need to populate weather widgets on a multitude of web sites. We're looking now into Python-based solutions.

This public weather.com RSS feed is a good example of what we'd be parsing (our actual weather.com feed contains additional information because of a partnership w/them).

In a nutshell, how should we convert XML to JSON using Python?

See also stackoverflow.com/questions/418497/…

Andreas Haferburg
– Andreas Haferburg

2021-10-21 15:18:52 +00:00
Commented Oct 21, 2021 at 15:18 — Andreas Haferburg
– Andreas Haferburg, Commented Oct 21, 2021 at 15:18

Benjamin Loison · Accepted Answer · 2023-05-01 19:28:27Z

399

xmltodict (full disclosure: I wrote it) can help you convert your XML to a dict+list+string structure, following this "standard". It is Expat-based, so it's very fast and doesn't need to load the whole XML tree in memory.

Once you have that data structure, you can serialize it to JSON:

import xmltodict, json

o = xmltodict.parse('<e> <a>text</a> <a>text</a> </e>')
json.dumps(o) # '{"e": {"a": ["text", "text"]}}'

edited May 1, 2023 at 19:28

Benjamin Loison

5,7514 gold badges20 silver badges37 bronze badges

answered Apr 18, 2012 at 1:06

Martin Blech

13.6k6 gold badges35 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

sayth Over a year ago

@Martin Blech If I create a json file from my django models file. How can I map my xml file to convert the xml to json for the required fields?

Martin Blech Over a year ago

@sayth I think you should post this as a separate SO question.

sayth Over a year ago

@Martin Blech. I added a question, but it's rather hard to fit in SO, I am a beginner so have provided as much info as I can, but I expect you may require more clarity stackoverflow.com/q/23676973/461887

sancelot Over a year ago

After so long time, I am a bit surprised xmltodict is not a "standard" library in some linux distributions. Altough it seems doing the job straight from what we can read, I will unfortunately use another solution like xslt conversion

Trect Over a year ago

Thanks a ton for writing this fantastic library. Though bs4 can do the job of xml to dict it is extremely easy to use the library

|

Dan Lenski · Accepted Answer · 2014-09-07 21:36:10Z

72

There is no "one-to-one" mapping between XML and JSON, so converting one to the other necessarily requires some understanding of what you want to do with the results.

That being said, Python's standard library has several modules for parsing XML (including DOM, SAX, and ElementTree). As of Python 2.6, support for converting Python data structures to and from JSON is included in the json module.

So the infrastructure is there.

edited Sep 7, 2014 at 21:36

answered Oct 10, 2008 at 14:34

Dan Lenski

80.4k13 gold badges86 silver badges129 bronze badges

2 Comments

nitinr708 Over a year ago

xmljson IMHO is the quickest to use with support for various conventions out of the box. pypi.org/project/xmljson

Dan Lenski Over a year ago

It's already been mentioned in newer answers. It still only covers a small subset of valid XML constructs, but probably the majority of what people use in practice.

S Anand · Accepted Answer · 2015-09-20 07:37:54Z

38

You can use the xmljson library to convert using different XML JSON conventions.

For example, this XML:

<p id="1">text</p>

translates via the BadgerFish convention into this:

{
  'p': {
    '@id': 1,
    '$': 'text'
  }
}

and via the GData convention into this (attributes are not supported):

{
  'p': {
    '$t': 'text'
  }
}

... and via the Parker convention into this (attributes are not supported):

{
  'p': 'text'
}

It's possible to convert from XML to JSON and from JSON to XML using the same conventions:

>>> import json, xmljson
>>> from lxml.etree import fromstring, tostring
>>> xml = fromstring('<p id="1">text</p>')
>>> json.dumps(xmljson.badgerfish.data(xml))
'{"p": {"@id": 1, "$": "text"}}'
>>> xmljson.parker.etree({'ul': {'li': [1, 2]}})
# Creates [<ul><li>1</li><li>2</li></ul>]

Disclosure: I wrote this library. Hope it helps future searchers.

answered Sep 20, 2015 at 7:37

S Anand

12.1k3 gold badges32 silver badges25 bronze badges

5 Comments

Martijn Pieters Over a year ago

That's a pretty cool library, but please do read How to offer personal open-source libraries? before you post more answers showing it off.

S Anand Over a year ago

Thanks @MartijnPieters -- I've just gone through this and will make sure I stick to this.

mbbeme Over a year ago

Thanks Anand for the solution – it seems to work well, doesn't have external dependencies, and provides a lot of flexibility in how the attributes are handled using the different conventions. Exactly what I needed and was the most flexible and simplest solution I found.

Patrik Beck Over a year ago

Thanks Anand - unfortunately, the I can't get it to parse XML with utf8 encoding. Going thru sources, it seems the encoding set thru XMLParser(..) is ignored

S Anand Over a year ago

@PatrikBeck could you please share an small example of XML with utf8 encoding that breaks?

Community · Accepted Answer · 2024-12-16 22:36:11Z

21

If some time you get only response code instead of all data then error like json parse will be there so u need to convert it as text

import xmltodict

data = requests.get(url)
xpars = xmltodict.parse(data.text)
js = json.dumps(xpars)
print(js)

edited Dec 16, 2024 at 22:36

CommunityBot

11 silver badge

answered May 12, 2018 at 8:17

Akshay Kumbhar

3193 silver badges6 bronze badges

1 Comment

elbe Over a year ago

The use of xmltodict is nice. As a remark, you shouldn't set json = json.dumps(xpars) which overrides json package name.

jnhustin · Accepted Answer · 2017-11-06 18:06:24Z

17

To anyone that may still need this. Here's a newer, simple code to do this conversion.

from xml.etree import ElementTree as ET

xml    = ET.parse('FILE_NAME.xml')
parsed = parseXmlToJson(xml)


def parseXmlToJson(xml):
  response = {}

  for child in list(xml):
    if len(list(child)) > 0:
      response[child.tag] = parseXmlToJson(child)
    else:
      response[child.tag] = child.text or ''

    # one-liner equivalent
    # response[child.tag] = parseXmlToJson(child) if len(list(child)) > 0 else child.text or ''

  return response

edited Nov 6, 2017 at 18:06

answered Nov 2, 2017 at 17:25

jnhustin

1871 silver badge4 bronze badges

3 Comments

hrbdg Over a year ago

It functions in Python 3.7 at least, though unfortunately it adds some unexpected data to the key names if certain values are in your xml, for example an xmlns tag on a root level node shows up in every node key like this: {'{maven.apache.org/POM/4.0.0}artifactId': 'test-service', which came from xml like this: <project xmlns="maven.apache.org/POM/4.0.0" xsi:schemaLocation="maven.apache.org/POM/4.0.0 maven.apache.org/xsd/maven-4.0.0.xsd" xmlns:xsi="w3.org/2001/XMLSchema-instance"> <modelVersion>4.0.0</modelVersion>

user128511 Over a year ago

TypeError: 'ElementTree' object is not iterable

Diego Magalhães Over a year ago

Change from list(xml) to xml.iter() to avoid TypeError: 'ElementTree' object is not iterable

Paulo Vj · Accepted Answer · 2012-04-19 15:35:21Z

Here's the code I built for that. There's no parsing of the contents, just plain conversion.

from xml.dom import minidom
import simplejson as json
def parse_element(element):
    dict_data = dict()
    if element.nodeType == element.TEXT_NODE:
        dict_data['data'] = element.data
    if element.nodeType not in [element.TEXT_NODE, element.DOCUMENT_NODE, 
                                element.DOCUMENT_TYPE_NODE]:
        for item in element.attributes.items():
            dict_data[item[0]] = item[1]
    if element.nodeType not in [element.TEXT_NODE, element.DOCUMENT_TYPE_NODE]:
        for child in element.childNodes:
            child_name, child_dict = parse_element(child)
            if child_name in dict_data:
                try:
                    dict_data[child_name].append(child_dict)
                except AttributeError:
                    dict_data[child_name] = [dict_data[child_name], child_dict]
            else:
                dict_data[child_name] = child_dict 
    return element.nodeName, dict_data

if __name__ == '__main__':
    dom = minidom.parse('data.xml')
    f = open('data.json', 'w')
    f.write(json.dumps(parse_element(dom), sort_keys=True, indent=4))
    f.close()

Michael Anderson · Accepted Answer · 2016-06-01 00:28:19Z

7

I'd suggest not going for a direct conversion. Convert XML to an object, then from the object to JSON.

In my opinion, this gives a cleaner definition of how the XML and JSON correspond.

It takes time to get right and you may even write tools to help you with generating some of it, but it would look roughly like this:

class Channel:
  def __init__(self)
    self.items = []
    self.title = ""

  def from_xml( self, xml_node ):
    self.title = xml_node.xpath("title/text()")[0]
    for x in xml_node.xpath("item"):
      item = Item()
      item.from_xml( x )
      self.items.append( item )

  def to_json( self ):
    retval = {}
    retval['title'] = title
    retval['items'] = []
    for x in items:
      retval.append( x.to_json() )
    return retval

class Item:
  def __init__(self):
    ...

  def from_xml( self, xml_node ):
    ...

  def to_json( self ):
    ...

edited Jun 1, 2016 at 0:28

answered Apr 18, 2012 at 1:24

Michael Anderson

74.3k9 gold badges147 silver badges196 bronze badges

1 Comment

Quuxplusone Over a year ago

Without some hint as to where the variable xml_node comes from (what type is it? what does its .xpath method do?), this answer is not very useful.

themihai · Accepted Answer · 2016-09-02 09:18:02Z

6

There is a method to transport XML-based markup as JSON which allows it to be losslessly converted back to its original form. See http://jsonml.org/.

It's a kind of XSLT of JSON. I hope you find it helpful

edited Sep 2, 2016 at 9:18

answered Jun 10, 2011 at 2:48

themihai

8,70111 gold badges44 silver badges65 bronze badges

Comments

dguaraglia · Accepted Answer · 2008-10-10 14:30:59Z

5

Well, probably the simplest way is just parse the XML into dictionaries and then serialize that with simplejson.

answered Oct 10, 2008 at 14:30

dguaraglia

6,0481 gold badge29 silver badges23 bronze badges

Comments

the Tin Man · Accepted Answer · 2016-05-31 20:28:15Z

5

You may want to have a look at http://designtheory.org/library/extrep/designdb-1.0.pdf. This project starts off with an XML to JSON conversion of a large library of XML files. There was much research done in the conversion, and the most simple intuitive XML -> JSON mapping was produced (it is described early in the document). In summary, convert everything to a JSON object, and put repeating blocks as a list of objects.

objects meaning key/value pairs (dictionary in Python, hashmap in Java, object in JavaScript)

There is no mapping back to XML to get an identical document, the reason is, it is unknown whether a key/value pair was an attribute or an <key>value</key>, therefore that information is lost.

If you ask me, attributes are a hack to start; then again they worked well for HTML.

edited May 31, 2016 at 20:28

the Tin Man

161k44 gold badges222 silver badges308 bronze badges

answered Oct 7, 2010 at 18:50

pykler

1912 silver badges2 bronze badges

Comments

shrewmouse · Accepted Answer · 2017-05-10 13:31:29Z

When I do anything with XML in python I almost always use the lxml package. I suspect that most people use lxml. You could use xmltodict but you will have to pay the penalty of parsing the XML again.

To convert XML to json with lxml you:

Parse XML document with lxml
Convert lxml to a dict
Convert list to json

I use the following class in my projects. Use the toJson method.

from lxml import etree 
import json


class Element:
    '''
    Wrapper on the etree.Element class.  Extends functionality to output element
    as a dictionary.
    '''

    def __init__(self, element):
        '''
        :param: element a normal etree.Element instance
        '''
        self.element = element

    def toDict(self):
        '''
        Returns the element as a dictionary.  This includes all child elements.
        '''
        rval = {
            self.element.tag: {
                'attributes': dict(self.element.items()),
            },
        }
        for child in self.element:
            rval[self.element.tag].update(Element(child).toDict())
        return rval


class XmlDocument:
    '''
    Wraps lxml to provide:
        - cleaner access to some common lxml.etree functions
        - converter from XML to dict
        - converter from XML to json
    '''
    def __init__(self, xml = '<empty/>', filename=None):
        '''
        There are two ways to initialize the XmlDocument contents:
            - String
            - File

        You don't have to initialize the XmlDocument during instantiation
        though.  You can do it later with the 'set' method.  If you choose to
        initialize later XmlDocument will be initialized with "<empty/>".

        :param: xml Set this argument if you want to parse from a string.
        :param: filename Set this argument if you want to parse from a file.
        '''
        self.set(xml, filename) 

    def set(self, xml=None, filename=None):
        '''
        Use this to set or reset the contents of the XmlDocument.

        :param: xml Set this argument if you want to parse from a string.
        :param: filename Set this argument if you want to parse from a file.
        '''
        if filename is not None:
            self.tree = etree.parse(filename)
            self.root = self.tree.getroot()
        else:
            self.root = etree.fromstring(xml)
            self.tree = etree.ElementTree(self.root)


    def dump(self):
        etree.dump(self.root)

    def getXml(self):
        '''
        return document as a string
        '''
        return etree.tostring(self.root)

    def xpath(self, xpath):
        '''
        Return elements that match the given xpath.

        :param: xpath
        '''
        return self.tree.xpath(xpath);

    def nodes(self):
        '''
        Return all elements
        '''
        return self.root.iter('*')

    def toDict(self):
        '''
        Convert to a python dictionary
        '''
        return Element(self.root).toDict()

    def toJson(self, indent=None):
        '''
        Convert to JSON
        '''
        return json.dumps(self.toDict(), indent=indent)


if __name__ == "__main__":
    xml='''<system>
    <product>
        <demod>
            <frequency value='2.215' units='MHz'>
                <blah value='1'/>
            </frequency>
        </demod>
    </product>
</system>
'''
    doc = XmlDocument(xml)
    print doc.toJson(indent=4)

The output from the built in main is:

{
    "system": {
        "attributes": {}, 
        "product": {
            "attributes": {}, 
            "demod": {
                "attributes": {}, 
                "frequency": {
                    "attributes": {
                        "units": "MHz", 
                        "value": "2.215"
                    }, 
                    "blah": {
                        "attributes": {
                            "value": "1"
                        }
                    }
                }
            }
        }
    }
}

Which is a transformation of this xml:

<system>
    <product>
        <demod>
            <frequency value='2.215' units='MHz'>
                <blah value='1'/>
            </frequency>
        </demod>
    </product>
</system>

Does this handle text within a tag (e.g. <text>Some text</text>)?
It's been a while. No but easy enough to do. Add 'text: self.element.textto toDict

Andrew_1510 · Accepted Answer · 2012-04-05 15:59:42Z

2

I found for simple XML snips, use regular expression would save troubles. For example:

# <user><name>Happy Man</name>...</user>
import re
names = re.findall(r'<name>(\w+)<\/name>', xml_string)
# do some thing to names

To do it by XML parsing, as @Dan said, there is not one-for-all solution because the data is different. My suggestion is to use lxml. Although not finished to json, lxml.objectify give quiet good results:

>>> from lxml import objectify
>>> root = objectify.fromstring("""
... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...   <a attr1="foo" attr2="bar">1</a>
...   <a>1.2</a>
...   <b>1</b>
...   <b>true</b>
...   <c>what?</c>
...   <d xsi:nil="true"/>
... </root>
... """)

>>> print(str(root))
root = None [ObjectifiedElement]
    a = 1 [IntElement]
      * attr1 = 'foo'
      * attr2 = 'bar'
    a = 1.2 [FloatElement]
    b = 1 [IntElement]
    b = True [BoolElement]
    c = 'what?' [StringElement]
    d = None [NoneElement]
      * xsi:nil = 'true'

edited Apr 5, 2012 at 15:59

answered Apr 4, 2012 at 13:06

Andrew_1510

13.8k10 gold badges57 silver badges52 bronze badges

1 Comment

Pooya Over a year ago

but it removes duplicate nodes

the Tin Man · Accepted Answer · 2016-05-31 20:25:18Z

2

While the built-in libs for XML parsing are quite good I am partial to lxml.

But for parsing RSS feeds, I'd recommend Universal Feed Parser, which can also parse Atom. Its main advantage is that it can digest even most malformed feeds.

Python 2.6 already includes a JSON parser, but a newer version with improved speed is available as simplejson.

With these tools building your app shouldn't be that difficult.

edited May 31, 2016 at 20:25

the Tin Man

161k44 gold badges222 silver badges308 bronze badges

answered Oct 10, 2008 at 18:51

Luka Marinko

1,7012 gold badges11 silver badges14 bronze badges

Comments

shx2 · Accepted Answer · 2016-10-20 10:38:21Z

My answer addresses the specific (and somewhat common) case where you don't really need to convert the entire xml to json, but what you need is to traverse/access specific parts of the xml, and you need it to be fast, and simple (using json/dict-like operations).

Approach

For this, it is important to note that parsing an xml to etree using lxml is super fast. The slow part in most of the other answers is the second pass: traversing the etree structure (usually in python-land), converting it to json.

Which leads me to the approach I found best for this case: parsing the xml using lxml, and then wrapping the etree nodes (lazily), providing them with a dict-like interface.

Code

Here's the code:

from collections import Mapping
import lxml.etree

class ETreeDictWrapper(Mapping):

    def __init__(self, elem, attr_prefix = '@', list_tags = ()):
        self.elem = elem
        self.attr_prefix = attr_prefix
        self.list_tags = list_tags

    def _wrap(self, e):
        if isinstance(e, basestring):
            return e
        if len(e) == 0 and len(e.attrib) == 0:
            return e.text
        return type(self)(
            e,
            attr_prefix = self.attr_prefix,
            list_tags = self.list_tags,
        )

    def __getitem__(self, key):
        if key.startswith(self.attr_prefix):
            return self.elem.attrib[key[len(self.attr_prefix):]]
        else:
            subelems = [ e for e in self.elem.iterchildren() if e.tag == key ]
            if len(subelems) > 1 or key in self.list_tags:
                return [ self._wrap(x) for x in subelems ]
            elif len(subelems) == 1:
                return self._wrap(subelems[0])
            else:
                raise KeyError(key)

    def __iter__(self):
        return iter(set( k.tag for k in self.elem) |
                    set( self.attr_prefix + k for k in self.elem.attrib ))

    def __len__(self):
        return len(self.elem) + len(self.elem.attrib)

    # defining __contains__ is not necessary, but improves speed
    def __contains__(self, key):
        if key.startswith(self.attr_prefix):
            return key[len(self.attr_prefix):] in self.elem.attrib
        else:
            return any( e.tag == key for e in self.elem.iterchildren() )


def xml_to_dictlike(xmlstr, attr_prefix = '@', list_tags = ()):
    t = lxml.etree.fromstring(xmlstr)
    return ETreeDictWrapper(
        t,
        attr_prefix = '@',
        list_tags = set(list_tags),
    )

This implementation is not complete, e.g., it doesn't cleanly support cases where an element has both text and attributes, or both text and children (only because I didn't need it when I wrote it...) It should be easy to improve it, though.

Speed

In my specific use case, where I needed to only process specific elements of the xml, this approach gave a suprising and striking speedup by a factor of 70 (!) compared to using @Martin Blech's xmltodict and then traversing the dict directly.

Bonus

As a bonus, since our structure is already dict-like, we get another alternative implementation of xml2json for free. We just need to pass our dict-like structure to json.dumps. Something like:

def xml_to_json(xmlstr, **kwargs):
    x = xml_to_dictlike(xmlstr, **kwargs)
    return json.dumps(x)

If your xml includes attributes, you'd need to use some alphanumeric attr_prefix (e.g. "ATTR_"), to ensure the keys are valid json keys.

I haven't benchmarked this part.

If I try doing json.dumps(tree) it says Object of type 'ETreeDictWrapper' is not JSON serializable

Robert Parelius · Accepted Answer · 2018-09-12 01:21:51Z

2

check out lxml2json (disclosure: I wrote it)

https://github.com/rparelius/lxml2json

it's very fast, lightweight (only requires lxml), and one advantage is that you have control over whether certain elements are converted to lists or dicts

answered Sep 12, 2018 at 1:21

Robert Parelius

211 bronze badge

Comments

srth12 · Accepted Answer · 2018-11-30 04:52:07Z

1

You can use declxml. It has advanced features like multi attributes and complex nested support. You just need to write a simple processor for it. Also with the same code, you can convert back to JSON as well. It is fairly straightforward and the documentation is awesome.

Link: https://declxml.readthedocs.io/en/latest/index.html

answered Nov 30, 2018 at 4:52

srth12

1,03113 silver badges16 bronze badges

Comments

Liju · Accepted Answer · 2020-08-26 08:19:50Z

If you don't want to use any external libraries and 3rd party tools, Try below code.

Code

import re
import json

def getdict(content):
    res=re.findall("<(?P<var>\S*)(?P<attr>[^/>]*)(?:(?:>(?P<val>.*?)</(?P=var)>)|(?:/>))",content)
    if len(res)>=1:
        attreg="(?P<avr>\S+?)(?:(?:=(?P<quote>['\"])(?P<avl>.*?)(?P=quote))|(?:=(?P<avl1>.*?)(?:\s|$))|(?P<avl2>[\s]+)|$)"
        if len(res)>1:
            return [{i[0]:[{"@attributes":[{j[0]:(j[2] or j[3] or j[4])} for j in re.findall(attreg,i[1].strip())]},{"$values":getdict(i[2])}]} for i in res]
        else:
            return {res[0]:[{"@attributes":[{j[0]:(j[2] or j[3] or j[4])} for j in re.findall(attreg,res[1].strip())]},{"$values":getdict(res[2])}]}
    else:
        return content

with open("test.xml","r") as f:
    print(json.dumps(getdict(f.read().replace('\n',''))))

Sample Input

<details class="4b" count=1 boy>
    <name type="firstname">John</name>
    <age>13</age>
    <hobby>Coin collection</hobby>
    <hobby>Stamp collection</hobby>
    <address>
        <country>USA</country>
        <state>CA</state>
    </address>
</details>
<details empty="True"/>
<details/>
<details class="4a" count=2 girl>
    <name type="firstname">Samantha</name>
    <age>13</age>
    <hobby>Fishing</hobby>
    <hobby>Chess</hobby>
    <address current="no">
        <country>Australia</country>
        <state>NSW</state>
    </address>
</details>

Output

[
  {
    "details": [
      {
        "@attributes": [
          {
            "class": "4b"
          },
          {
            "count": "1"
          },
          {
            "boy": ""
          }
        ]
      },
      {
        "$values": [
          {
            "name": [
              {
                "@attributes": [
                  {
                    "type": "firstname"
                  }
                ]
              },
              {
                "$values": "John"
              }
            ]
          },
          {
            "age": [
              {
                "@attributes": []
              },
              {
                "$values": "13"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Coin collection"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Stamp collection"
              }
            ]
          },
          {
            "address": [
              {
                "@attributes": []
              },
              {
                "$values": [
                  {
                    "country": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "USA"
                      }
                    ]
                  },
                  {
                    "state": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "CA"
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  },
  {
    "details": [
      {
        "@attributes": [
          {
            "empty": "True"
          }
        ]
      },
      {
        "$values": ""
      }
    ]
  },
  {
    "details": [
      {
        "@attributes": []
      },
      {
        "$values": ""
      }
    ]
  },
  {
    "details": [
      {
        "@attributes": [
          {
            "class": "4a"
          },
          {
            "count": "2"
          },
          {
            "girl": ""
          }
        ]
      },
      {
        "$values": [
          {
            "name": [
              {
                "@attributes": [
                  {
                    "type": "firstname"
                  }
                ]
              },
              {
                "$values": "Samantha"
              }
            ]
          },
          {
            "age": [
              {
                "@attributes": []
              },
              {
                "$values": "13"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Fishing"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Chess"
              }
            ]
          },
          {
            "address": [
              {
                "@attributes": [
                  {
                    "current": "no"
                  }
                ]
              },
              {
                "$values": [
                  {
                    "country": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "Australia"
                      }
                    ]
                  },
                  {
                    "state": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "NSW"
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
]

daparic · Accepted Answer · 2017-02-09 16:43:09Z

0

This stuff here is actively maintained and so far is my favorite: xml2json in python

answered Feb 9, 2017 at 16:43

daparic

4,6222 gold badges47 silver badges46 bronze badges

3 Comments

Coin Graham Over a year ago

This is deprecated.

daparic Over a year ago

Indeed. Can we know the exact date of a github repo moved to read-only/archived? All I see is last commit was 2019. Thanks.

Coin Graham Over a year ago

"This module is deprecated and will not be updated anymore (May 2019)"

David Lee · Accepted Answer · 2021-02-23 20:45:22Z

I published one on github a while back..

https://github.com/davlee1972/xml_to_json

This converter is written in Python and will convert one or more XML files into JSON / JSONL files

It requires a XSD schema file to figure out nested json structures (dictionaries vs lists) and json equivalent data types.

python xml_to_json.py -x PurchaseOrder.xsd PurchaseOrder.xml

INFO - 2018-03-20 11:10:24 - Parsing XML Files..
INFO - 2018-03-20 11:10:24 - Processing 1 files
INFO - 2018-03-20 11:10:24 - Parsing files in the following order:
INFO - 2018-03-20 11:10:24 - ['PurchaseOrder.xml']
DEBUG - 2018-03-20 11:10:24 - Generating schema from PurchaseOrder.xsd
DEBUG - 2018-03-20 11:10:24 - Parsing PurchaseOrder.xml
DEBUG - 2018-03-20 11:10:24 - Writing to file PurchaseOrder.json
DEBUG - 2018-03-20 11:10:24 - Completed PurchaseOrder.xml

I also have a follow up xml to parquet converter that works in a similar fashion

https://github.com/blackrock/xml_to_parquet

Collectives™ on Stack Overflow

Converting XML to JSON using Python?

19 Answers 19

10 Comments

2 Comments

5 Comments

1 Comment

3 Comments

Comments

1 Comment

Comments

Comments

Comments

2 Comments

1 Comment

Comments

Approach

Code

Speed

Bonus

1 Comment

Comments

Comments

Comments

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

19 Answers 19

10 Comments

2 Comments

5 Comments

1 Comment

3 Comments

Comments

1 Comment

Comments

Comments

Comments

2 Comments

1 Comment

Comments

Approach

Code

Speed

Bonus

1 Comment

Comments

Comments

Comments

3 Comments

Comments

Linked

Related