How to validate xml using python without third-party libs?

Question

I have some xml pieces like this:

<!DOCTYPE mensaje SYSTEM "record.dtd">
<record>
    <player_birthday>1979-09-23</player_birthday>
    <player_name>Orene Ai'i</player_name>
    <player_team>Blues</player_team>
    <player_id>453</player_id>
    <player_height>170</player_height>
    <player_position>F&W</player_position>   <---- a '&' here.
    <player_weight>75</player_weight>
</record>

Is there any way to validate whether the xml pieces is well-formatted? Is there any way to validate the xml against a DTD or XML Scheme?

For various reasons I can't use any third-party packages.

e.g. the xml above is not conrrect since it has a '&' in it. Note that the DOCTYPE definition sentence refer to a DTD.

I consider it risky, to violate XML on token level (level-0) and hope to find a tool, which checks for level-1 compliance. The probability to find one is not higher in first-party tools. If I count correctly in the backtrace, the answer of jsbueno fails due to that. Why is replacing by "&" not an option? — guidot
– guidot, Commented Dec 6, 2012 at 13:15

jsbueno · Accepted Answer · 2012-12-06 11:27:34Z

41

Just try to parse it with ElementTree (xml.etree.ElementTree.fromstring) - it will raise an error if the XML is not well formed.

>>> a = """<record>
...     <player_birthday>1979-09-23</player_birthday>
...     <player_name>Orene Ai'i</player_name>
...     <player_team>Blues</player_team>
...     <player_id>453</player_id>
...     <player_height>170</player_height>
...     <player_position>F&W</player_position>   <---- a '&' here.
...     <player_weight>75</player_weight>
... </record>"""
>>> 
>>> from xml.etree import ElementTree as ET
>>> x = ET.fromstring(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1282, in XML
    parser.feed(text)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1624, in feed
    self._raiseerror(v)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1488, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 7, column 24

answered Dec 6, 2012 at 11:27

jsbueno

113k11 gold badges159 silver badges239 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Kiryl Bielašeŭski Over a year ago

How to avoid warning

FutureWarning: The behavior of this method will change in future versions.  Use specific 'len(elem)' or 'elem is not None' test instead.   if ElementTree.fromstring(output_payload):

?

z33k Over a year ago

This is due to nature of xml.etree that makes testing Element objects for truthiness dependent on what's inside them (so elements that have no subelements will evaluate to False). That's why they write that a specific test for truthiness is needed (if elem is not None and not if elem). They decided to change this behavior. You can suppress warnings using: with open(os.devnull, "w") as devnull: and then with contextlib.redirect_stderr(devnull): .

Thomas Orozco · Accepted Answer · 2012-12-06 11:27:15Z

9

You can use python's xml.dom.minidom XML parser (which is in the standard library, but isn't as powerful as alternatives such as lxml).

Just do:

import xml.dom.minidom
xml.dom.minidom.parseString('<My><XML><String/><XML/><My/>')

You will get a xml.parsers.expat.ExpatError if the XML is invalid.

answered Dec 6, 2012 at 11:27

Thomas Orozco

55.6k12 gold badges120 silver badges120 bronze badges

3 Comments

jsbueno Over a year ago

Minidom is no longer the prefered way of parsing MXL in standard Python (although it won't matter in this specific case, unless performance matters)

guidot Over a year ago

You may want to correct the XML spelling; by the way: what is the preferred way now?

Thomas Orozco Over a year ago

@guidot jsbueno suggested the use of ElementTree in his own answer which is actually more powerful than minidom and should indeed be used! If you have access to non-standard libraries, lxml probably is the best out there!

Collectives™ on Stack Overflow

How to validate xml using python without third-party libs?

2 Answers 2

2 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related