I'm trying to use lxml to help me parse some XML files and output it. However,there are some special characters in the XML file. I don't want to replace it because it is too complicated to escape it and unescape it. Also I can't force the others to produce a well-formed XML.
Is there any way Python can let me handle the non-well-formed XML with lxml?
I can read it in properly:
parser = etree.XMLParser(recover=True)
root = etree.parse(sys.argv[1],parser=parser)
But when I want to print the element text, it can only print the content until the special character occurs.
for element in root.iter("content"):
print("%s - %s attr - %s" % (element.tag, element.text, element.get("name")))