3

This question is a follow up to this answer: https://stackoverflow.com/a/51972010/3480297

I'm trying to remove the namespace from an XML file. The linked answer works fine when there are no comments in the XML. However, if there is a comment, an error is thrown.

This is an example of my code:

from lxml import etree

input_xml = '''
<package xmlns="http://apple.com/itunes/importer">
  <provider>some data <!-- example comment--> </provider>
  <language>en-GB</language>
</package>
'''
root = etree.fromstring(input_xml)

# Remove namespace prefixes
for elem in root.getiterator():
    elem.tag = etree.QName(elem).localname
# Remove unused namespace declarations
etree.cleanup_namespaces(root)

print(etree.tostring(root).decode())

This throws the following error:

ValueError: Invalid input tag of type class <'cython_function_or_method'>

EDIT:

If I have the following "input_xml" structure, not all the namespaces are taken out using the code in the below answer.

<package xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://com/scheme/location/example/ Location.xsd ">
  <provider>some data <!-- example comment--> </provider>
  <language>en-GB</language>
</package>

The result of the code is still:

<package xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://com/scheme/location/example/ Location.xsd ">
  <provider>some data <!-- example comment--> </provider>
  <language>en-GB</language>
</package>
4
  • "I'm trying to remove the namespace from an XML file." That's always suspicious and rarely a good idea (or necessary). Why are you trying to do that? Commented Mar 2, 2020 at 10:56
  • I'm trying to perform simple outputs (without extracting any information specifically from the XML at that point) and I would like to not have the namespaces. Commented Mar 2, 2020 at 11:01
  • Not sure if I get that...? Simple outputs without extracting information? Commented Mar 2, 2020 at 12:20
  • I meant that modifying the XML directly won't cause me any issues as I'm just displaying certain parts of it without parsing/extracting information from it. So modifying it won't be a problem. Commented Mar 2, 2020 at 13:31

1 Answer 1

3

Make sure that the node is not a comment before changing the tag. The code below also removes any attributes that are in a namespace.

for elem in root.getiterator():
    # For elements, replace qualified name with localname
    if not(type(elem) == etree._Comment):
        elem.tag = etree.QName(elem).localname

    # Remove attributes that are in a namespace
    for attr in elem.attrib:
        if "{" in attr:
            elem.attrib.pop(attr)
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you! This works for the original code. But I have an issue when there are additional namespaces and they're not all being removed. Could you have a look at my edited question please?
In the second example, you have an attribute bound to a namespace (xsi:schemaLocation). You need to remove this attribute if you don't want any namespace declarations in the document.
Is there a way to do that with the code rather than modifying the XML manually?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.