1

I have a xml file with some data.

<Emp>
<Name>Raja</Name>
<Location>
     <city>ABC</city>
     <geocode>123</geocode>
     <state>XYZ</state> 
</Location>
<sal>100</sal>
<type>temp</type> 
</Emp>

so the location information are wrong in the xml file, I have to replace it.

I have constructed the location information with corrected vales in python.

variable = '''
    <Location isupdated=1>
         <city>MyCity</city>
         <geocode>10.12</geocode>
         <state>MyState</state> 
    </Location>'''

So, the location tag should be replaced with the new information. Is there any simple way to update this in python.

I want the final result data like,

<Emp>
<Name>Raja</Name>
<Location isupdated=1>
         <city>MyCity</city>
         <geocode>10.12</geocode>
         <state>MyState</state>
</Location>
<sal>100</sal>
<type>temp</type> 
</Emp>

Any thoughts ??

Thanks.

3
  • I think the regex tag should be removed, it is no task for regex. Looks like a classic example for a node replacement with XML parser. Commented Oct 13, 2015 at 9:50
  • 1
    Possible duplicate of Find and Replace Values in XML using Python Commented Oct 13, 2015 at 9:54
  • @SaketMittal, But here I want to replace the Location tags itself? may be additionally one more tag added to variable template. I just want to remove the <location> tag, and add the variable content in it. Commented Oct 13, 2015 at 9:58

1 Answer 1

4

UPDATE - XML PARSER IMPLEMENTATION : since replace a specific <Location> tag require to modify the regex i'm providing a more general and safer alternative implementation based upon ElementTree parser (as stated above by @stribizhev and @Saket Mittal).

I've to add a root element <Emps> (to make a valid xml doc, requiring root element), i've also chosen to filter the location to edit by the <city> tag (but may be everyfield):

#!/usr/bin/python
# Alternative Implementation with ElementTree XML Parser

xml = '''\
<Emps>
    <Emp>
        <Name>Raja</Name>
        <Location>
            <city>ABC</city>
            <geocode>123</geocode>
            <state>XYZ</state>
        </Location>
        <sal>100</sal>
        <type>temp</type>
    </Emp>
    <Emp>
        <Name>GsusRecovery</Name>
        <Location>
            <city>Torino</city>
            <geocode>456</geocode>
            <state>UVW</state>
        </Location>
        <sal>120</sal>
        <type>perm</type>
    </Emp>
</Emps>
'''

from xml.etree import ElementTree as ET
# tree = ET.parse('input.xml')  # decomment to parse xml from file
tree = ET.ElementTree(ET.fromstring(xml))
root = tree.getroot()

for location in root.iter('Location'):
    if location.find('city').text == 'Torino':
        location.set("isupdated", "1")
        location.find('city').text = 'MyCity'
        location.find('geocode').text = '10.12'
        location.find('state').text = 'MyState'

print ET.tostring(root, encoding='utf8', method='xml')
# tree.write('output.xml') # decomment if you want to write to file

Online runnable version of the code here

PREVIOUS REGEX IMPLEMENTATION

This is a possible implementation using the lazy modifier .*? and dot all (?s):

#!/usr/bin/python

import re

xml = '''\
<Emp>
<Name>Raja</Name>
<Location>
     <city>ABC</city>
     <geocode>123</geocode>
     <state>XYZ</state>
</Location>
</Emp>'''

locUpdate = '''\
    <Location isupdated=1>
         <city>MyCity</city>
         <geocode>10.12</geocode>
         <state>MyState</state>
    </Location>'''

output = re.sub(r"(?s)<Location>.*?</Location>", r"%s" % locUpdate, xml)

print output

You can test the code online here

Caveat: if there are more than one <Location> tag in the xml input the regex replace them all with locUpdate. You have to use:

# (note the last ``1`` at the end to limit the substitution only to the first occurrence)
output = re.sub(r"(?s)<Location>.*?</Location>", r"%s" % locUpdate, xml, 1)
Sign up to request clarification or add additional context in comments.

5 Comments

I have some tag after the </Location> tag, how to preserve that too in the output?
@MagendranV If you test the online implementation the </Emp> tag after </Locatoin> is preserved. Anyway if you provide your xml input i'll check
I have slightly modified the input. please take a look
@MagendranV: nothing changes for me, everything after '</Location>' is preserved (you can check the stdout of the modified code online. I believe there is something different in the way xml in provided to the regex or similar.
@MagendranV: i've provided an alternative implementation using ElementTree as XML parser with finer control over the substitution. This implementation (parsing from string or file provided) can solve the other problems you mentioned above as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.