5

---Update 3: I have got the script to update the required data into the xml files completed but the following code is being dropped from the written file. Why is this? how can I replace it?

<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type='text/xsl' href='ANZMeta.xsl'?>

Current working code (except for issue mentioned above).

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

path=os.getcwd()
arcpy.env.workspace = path

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)
zone="_Zone"

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_BaseMetadata.xml"

    check_meta=os.listdir(path)
    if FileNm+'.xml' in check_meta:
        shutil.copy2(FileNm+'.xml', newMetaFile)
    else:
        shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    print "Processing: "+str(File)

    for node in tree.findall('.//title'):
        node.text = str(FileNm)
    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = str(os.getcwd()+"\\"+File)
    for node in tree.findall('.//native/digform/formname'):
        node.text = str(FileDesc_obj.featureType)
    for node in tree.findall('.//avlform/nondig/formname'):
        node.text = str(FileDesc_obj.extension)
    for node in tree.findall('.//avlform/digform/formname'):
        node.text = str(float(os.path.getsize(File))/int(1024))+" KB"
    for node in tree.findall('.//theme'):
        node.text = str(FileDesc_obj.spatialReference.name +" ; EPSG: "+str(FileDesc_obj.spatialReference.factoryCode))
    print node.text
    projection_info=[]
    Zone=FileDesc_obj.spatialReference.name

    if "GCS" in str(FileDesc_obj.spatialReference.name):
        projection_info=[FileDesc_obj.spatialReference.GCSName, FileDesc_obj.spatialReference.angularUnitName, FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName]
        print "Geographic Coordinate system"
    else:
        projection_info=[FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName, FileDesc_obj.spatialReference.angularUnitName, Zone[Zone.rfind(zone)-3:]]
        print "Projected Coordinate system"
    x=0
    for node in tree.findall('.//spdom'):
        for node2 in node.findall('.//keyword'):
            print node2.text
            node2.text = str(projection_info[x])
            print node2.text
            x=x+1


    tree.write(newMetaFile)

---Update 1&2: Thanks to Aleyna I have the following basic code that works

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

CodeString=['northbc','southbc', '<nondig><formname>']

nondig='nondigital'
path=os.getcwd()
arcpy.env.workspace = path
xmlfile = path+"\\test.xml"

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_Metadata.xml"
    shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = nondig

    tree.write(newMetaFile)

The issue is with dealing with xml code like

- <spdom>
  <keyword thesaurus="">GDA94</keyword> 
  <keyword thesaurus="">GRS80</keyword> 
  <keyword thesaurus="">Transverse Mercator</keyword> 
  <keyword thesaurus="">Zone 55 (144E - 150E)</keyword> 
  </spdom>

As keyword thes...is not unique within the <spdom> can we update these in a order from the values coming from

FileDesc_obj.spatialReference.name

u'GCS_GDA_1994'

---ORIGINAL POST---

I am building up a program to generate xml metadata files from spatial files in our library. I have already created the scripts to extract the required spatial and attrib data from the files and create a shp and text file based index of the files but now I want to write this info to base metadata xml file that is written to anzlic standards by replacing the values held by common/static elements...

So for example I want to replace the following xml code

<northbc>8097970</northbc>
<southbc>8078568</southbc>

with

<northbc> GeneratedValue_[desc.extent.XMax] /<northbc>
<southbc> GeneratedValue_[desc.extent.XMax] </southbc>

The issue is that obviously the number/value between and will not be the same.

Similarly for xml tags like <title>, <nondig><formname> etc...in the latter example both tags must be searched for together as formname appears multiple times (is not unique).

I am using the Python Regular Expression manual [here][1],

8
  • 1
    See stackoverflow.com/a/1732454/383402 Commented Jan 30, 2012 at 3:02
  • thanks...I am not trying to write an xml file from scratch. I just want to replace chunks of text within given attributes based on input from the arcpy module. Commented Jan 30, 2012 at 3:21
  • 1
    So when it produces output that looks like <northbc><!-- Comment -->8097970</northbc>, your regex will handle it? Commented Jan 30, 2012 at 3:22
  • why would it? it is just getting desc.extent.XMax where desc=arcpy.Describe(shp_file) for example. Commented Jan 30, 2012 at 3:25
  • Look, is it really so hard to use a library designed for what you're trying to do instead of one designed for parsing unstructured text? I'm really trying to save you a headache, here. Commented Jan 30, 2012 at 3:26

3 Answers 3

2

Using the given tag(s) above:

import os
import xml
from xml.etree import ElementTree as et 
path = r"/your/path/to/xml.file" 
tree = et.parse(path)
for node in tree.findall('.//northbc'):
    node.text = "New Value"
tree.write(path)

Here, XPATH .//northbc returns all the 'northbc' nodes in the XML doc. You can tailor the code for your need easily.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks but I get the following... >> path=os.getcwd() >> tree=et.parse(path) Traceback (most recent call last): File "C:\Program Files (x86)\Wing IDE 101 4.0\src\debug\tserver_sandbox.py", line 1, in <module> # Used internally for debug sandbox under external interpreter File "C:\Python26\ArcGIS10.0\Lib\xml\etree\ElementTree.py", line 862, in parse tree.parse(source, parser) File "C:\Python26\ArcGIS10.0\Lib\xml\etree\ElementTree.py", line 579, in parse source = open(source, "rb") IOError: [Errno 13] Permission denied: 'L:\\Data_Admin\\QA\\Metadata_python_toolset\\training'
Please DISREGARD my previous comment. It works fine when path is an actual xml file. What would you do with repeating tags like the 3rd example - '<nondig><formname>' where formname is repeated but nondig is unique.
If I am getting it right, you have multiple <formname>s that are direct children of unique <nondig> nodes? Then you can use such an xpath .//nondig/formname to get <formname>s. You can either walt up in the tree and check the parent <nondig> before replacing the value or even better you can rewrite your xpath using parent's unique attr(perhaps an id?) so that <formname>s will be grouped by <nondig>s.
Not sure if .//spdom/keyword will return you <keyword>s in the order they appear in the doc. However, you can just select all <spdom>s and walk thru child <keyword>s in a loop replacing the values in order they come from doc. (And of course, the order in doc must match the order in your new data source)
1

If you're dealing with valid XML, use XPath to find the nodes of interest and the ElementTree api to manipulate the node.

For instance, your xpath might be something like '//northbc' and you would just replace the text node inside it.

See http://docs.python.org/library/xml.etree.elementtree.html as well as http://pypi.python.org/pypi/lxml/2.2.8 for two different libraries that will help you get this done. Search google for XPath and see the w3c tutorial for a decent intro to XPath (I apparently can't post more than two links in a post or I'd link it too)

1 Comment

thanks. This seems to on the right track and am just going through w3schools.com/xpath
0

I might be stating the obvious here, but did you consider using a DOM tree to parse and manipulate your XML?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.