Remove a specific xml tag with ElementTree in python

Question

I am searching for a way to remove a specific tag <e> that has value as mmm within xml file (i.e <e>mmm</e>. I am referring to this thread as staring guide: How to remove elements from XML using Python without using lxml library instead of using ElementTree with python v2.6.6. I was trying to connect a dot with the thread and reading upon ElementTree api doc but I haven't been successful.

I appreciate your advice and thought on this.

<?xml version='1.0' encoding='UTF-8'?>
<parent>
   <first>
     <a>123</a>                              
     <c>987</c>
       <d>
         <e>mmm</e>
         <e>yyy</e>           
       </d>         
   </first>
   <second>
     <a>456</a>                      
     <c>345</c>
       <d>
         <e>mmm</e>
         <e>hhh</e>            
       </d>
   </second>
 </parent>

Nimantha · Accepted Answer · 2021-10-29 04:05:42Z

It took a while for me to realise all `<e>` tags are subnodes of `<d>`.

If we can assume the above is true for all your target nodes (<e> nodes with value mmm), you can use this script. (I added some extra nodes to check if it worked

import xml.etree.ElementTree as ET

xml_string = """<?xml version='1.0' encoding='UTF-8'?>
<parent>
   <first>
     <a>123</a>                              
     <c>987</c>
       <d>
         <e>mmm</e>
         <e>aaa</e>
         <e>mmm</e>
         <e>yyy</e>           
       </d>         
   </first>
   <second>
     <a>456</a>                      
     <c>345</c>
       <d>
         <e>mmm</e>
         <e>hhh</e>            
       </d>
   </second>
 </parent>"""

# this is how I create my root, if you choose to do it in a different way the end of this script might not be useful
root = ET.fromstring(xml_string)

target_node_first_parent = 'd'
target_node = 'e'
target_text = 'mmm'

# find all <d> nodes
for node in root.iter(target_node_first_parent):
    # find <e> subnodes of <d>
    for subnode in node.iter(target_node):
        if subnode.text == target_text:
            node.remove(subnode)

# output the result         
tree = ET.ElementTree(root)
tree.write('output.xml')

I tried to just remove nodes found by root.iter(yourtag) but apparently it's not possible from the root (apparently it was not that easy)

Nana Owusu · Accepted Answer · 2020-05-27 22:27:02Z

1

The answer by @Queuebee is exactly correct but incase you want to read from a file, the code below provides a way to do that.

import xml.etree.ElementTree as ET

file_loc = " "
xml_tree_obj = ET.parse(file_loc)

xml_roots = xml_tree_obj.getroot()

target_node_first_parent = 'd'
target_node = 'e'
target_text = 'mmm'

# find all <d> nodes
for node in xml_roots.iter(target_node_first_parent):
    # find <e> subnodes of <d>
    for subnode in node.iter(target_node):
        if subnode.text == target_text:
            node.remove(subnode)

out_tree = ET.ElementTree(xml_roots)
out_tree.write('output.xml')

answered May 27, 2020 at 22:27

Nana Owusu

967 bronze badges

Collectives™ on Stack Overflow

Remove a specific xml tag with ElementTree in python

2 Answers 2

It took a while for me to realise all `<e>` tags are subnodes of `<d>`.

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

It took a while for me to realise all <e> tags are subnodes of <d>.

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

It took a while for me to realise all `<e>` tags are subnodes of `<d>`.