Search and replace multiple lines in xml/text files using python

Question

---Update 3: I have got the script to update the required data into the xml files completed but the following code is being dropped from the written file. Why is this? how can I replace it?

<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type='text/xsl' href='ANZMeta.xsl'?>

Current working code (except for issue mentioned above).

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

path=os.getcwd()
arcpy.env.workspace = path

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)
zone="_Zone"

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_BaseMetadata.xml"

    check_meta=os.listdir(path)
    if FileNm+'.xml' in check_meta:
        shutil.copy2(FileNm+'.xml', newMetaFile)
    else:
        shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    print "Processing: "+str(File)

    for node in tree.findall('.//title'):
        node.text = str(FileNm)
    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = str(os.getcwd()+"\\"+File)
    for node in tree.findall('.//native/digform/formname'):
        node.text = str(FileDesc_obj.featureType)
    for node in tree.findall('.//avlform/nondig/formname'):
        node.text = str(FileDesc_obj.extension)
    for node in tree.findall('.//avlform/digform/formname'):
        node.text = str(float(os.path.getsize(File))/int(1024))+" KB"
    for node in tree.findall('.//theme'):
        node.text = str(FileDesc_obj.spatialReference.name +" ; EPSG: "+str(FileDesc_obj.spatialReference.factoryCode))
    print node.text
    projection_info=[]
    Zone=FileDesc_obj.spatialReference.name

    if "GCS" in str(FileDesc_obj.spatialReference.name):
        projection_info=[FileDesc_obj.spatialReference.GCSName, FileDesc_obj.spatialReference.angularUnitName, FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName]
        print "Geographic Coordinate system"
    else:
        projection_info=[FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName, FileDesc_obj.spatialReference.angularUnitName, Zone[Zone.rfind(zone)-3:]]
        print "Projected Coordinate system"
    x=0
    for node in tree.findall('.//spdom'):
        for node2 in node.findall('.//keyword'):
            print node2.text
            node2.text = str(projection_info[x])
            print node2.text
            x=x+1


    tree.write(newMetaFile)

---Update 1&2: Thanks to Aleyna I have the following basic code that works

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

CodeString=['northbc','southbc', '<nondig><formname>']

nondig='nondigital'
path=os.getcwd()
arcpy.env.workspace = path
xmlfile = path+"\\test.xml"

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_Metadata.xml"
    shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = nondig

    tree.write(newMetaFile)

The issue is with dealing with xml code like

- <spdom>
  <keyword thesaurus="">GDA94</keyword> 
  <keyword thesaurus="">GRS80</keyword> 
  <keyword thesaurus="">Transverse Mercator</keyword> 
  <keyword thesaurus="">Zone 55 (144E - 150E)</keyword> 
  </spdom>

As keyword thes...is not unique within the <spdom> can we update these in a order from the values coming from

FileDesc_obj.spatialReference.name

u'GCS_GDA_1994'

---ORIGINAL POST---

I am building up a program to generate xml metadata files from spatial files in our library. I have already created the scripts to extract the required spatial and attrib data from the files and create a shp and text file based index of the files but now I want to write this info to base metadata xml file that is written to anzlic standards by replacing the values held by common/static elements...

So for example I want to replace the following xml code

<northbc>8097970</northbc>
<southbc>8078568</southbc>

with

<northbc> GeneratedValue_[desc.extent.XMax] /<northbc>
<southbc> GeneratedValue_[desc.extent.XMax] </southbc>

The issue is that obviously the number/value between and will not be the same.

Similarly for xml tags like <title>, <nondig><formname> etc...in the latter example both tags must be searched for together as formname appears multiple times (is not unique).

I am using the Python Regular Expression manual [here][1],

thanks...I am not trying to write an xml file from scratch. I just want to replace chunks of text within given attributes based on input from the arcpy module. — GeorgeC
– GeorgeC, Commented Jan 30, 2012 at 3:21
So when it produces output that looks like <northbc>8097970</northbc>, your regex will handle it? — Borealid
– Borealid, Commented Jan 30, 2012 at 3:22
why would it? it is just getting desc.extent.XMax where desc=arcpy.Describe(shp_file) for example. — GeorgeC
– GeorgeC, Commented Jan 30, 2012 at 3:25
Look, is it really so hard to use a library designed for what you're trying to do instead of one designed for parsing unstructured text? I'm really trying to save you a headache, here. — Borealid
– Borealid, Commented Jan 30, 2012 at 3:26

Aleyna · Accepted Answer · 2012-01-30 04:29:08Z

2

Using the given tag(s) above:

import os
import xml
from xml.etree import ElementTree as et 
path = r"/your/path/to/xml.file" 
tree = et.parse(path)
for node in tree.findall('.//northbc'):
    node.text = "New Value"
tree.write(path)

Here, XPATH .//northbc returns all the 'northbc' nodes in the XML doc. You can tailor the code for your need easily.

answered Jan 30, 2012 at 4:29

Aleyna

1,8574 gold badges20 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

GeorgeC Over a year ago

Thanks but I get the following... >> path=os.getcwd() >> tree=et.parse(path) Traceback (most recent call last): File "C:\Program Files (x86)\Wing IDE 101 4.0\src\debug\tserver_sandbox.py", line 1, in <module> # Used internally for debug sandbox under external interpreter File "C:\Python26\ArcGIS10.0\Lib\xml\etree\ElementTree.py", line 862, in parse tree.parse(source, parser) File "C:\Python26\ArcGIS10.0\Lib\xml\etree\ElementTree.py", line 579, in parse source = open(source, "rb") IOError: [Errno 13] Permission denied: 'L:\\Data_Admin\\QA\\Metadata_python_toolset\\training'

GeorgeC Over a year ago

Please DISREGARD my previous comment. It works fine when path is an actual xml file. What would you do with repeating tags like the 3rd example - '<nondig><formname>' where formname is repeated but nondig is unique.

Aleyna Over a year ago

If I am getting it right, you have multiple <formname>s that are direct children of unique <nondig> nodes? Then you can use such an xpath .//nondig/formname to get <formname>s. You can either walt up in the tree and check the parent <nondig> before replacing the value or even better you can rewrite your xpath using parent's unique attr(perhaps an id?) so that <formname>s will be grouped by <nondig>s.

Aleyna Over a year ago

Not sure if .//spdom/keyword will return you <keyword>s in the order they appear in the doc. However, you can just select all <spdom>s and walk thru child <keyword>s in a loop replacing the values in order they come from doc. (And of course, the order in doc must match the order in your new data source)

gfortune · Accepted Answer · 2012-01-30 03:27:04Z

1

If you're dealing with valid XML, use XPath to find the nodes of interest and the ElementTree api to manipulate the node.

For instance, your xpath might be something like '//northbc' and you would just replace the text node inside it.

See http://docs.python.org/library/xml.etree.elementtree.html as well as http://pypi.python.org/pypi/lxml/2.2.8 for two different libraries that will help you get this done. Search google for XPath and see the w3c tutorial for a decent intro to XPath (I apparently can't post more than two links in a post or I'd link it too)

answered Jan 30, 2012 at 3:27

gfortune

2,63916 silver badges14 bronze badges

1 Comment

GeorgeC Over a year ago

thanks. This seems to on the right track and am just going through w3schools.com/xpath

inspectorG4dget · Accepted Answer · 2012-01-30 03:21:39Z

0

I might be stating the obvious here, but did you consider using a DOM tree to parse and manipulate your XML?

answered Jan 30, 2012 at 3:21

inspectorG4dget

115k30 gold badges159 silver badges253 bronze badges

Collectives™ on Stack Overflow

Search and replace multiple lines in xml/text files using python

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related