2

I have proposed a similar question before, but this one is slightly different. I want to find and replace XML tags using python. I am using the XML's to upload as metadata for some GIS shapefiles. In the metadata editor, I have options to choose dates for when certain data is collected. The options are 'single date', 'multiple dates' and 'range of dates'. In the first XML, which contains tags for a range of dates, you will see tags "rngdates" with some subelements 'begdate', 'begtime', 'enddate' and . I want to edit these tags out so that it looks like the second XML which contains multiple single dates. The new tags are 'mdattim', 'sngdate' and 'caldate'. I hope this is clear enough, but please ask for more info if needed. XML is a weird beast, and I'm still not fully understanding it.

Thanks, Mike

First XML:

<idinfo>
  <citation>
    <citeinfo>
       <origin>My Company Name</origin>
       <pubdate>05/04/2009</pubdate>
       <title>Feature Class Name</title>
       <edition>0</edition>
       <geoform>vector digital data</geoform>
       <onlink>.</onlink>
     </citeinfo>
   </citation>
<descript>
  <abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract>
  <purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose>
 </descript>
<timeperd>
 <timeinfo>
   <rngdates>
     <begdate>7/13/2010</begdate>
     <begtime>unknown</begtime>
     <enddate>7/15/2010</enddate>
     <endtime>unknown</endtime>
    </rngdates>
 </timeinfo>
 <current>ground condition</current>
</timeperd>

Second XML:

<idinfo>
  <citation>
    <citeinfo>
      <origin>My Company Name</origin>
      <pubdate>03/07/2011</pubdate>
      <title>Feature Class Name</title>
      <edition>0</edition>
      <geoform>vector digital data</geoform>
      <onlink>.</onlink>
    </citeinfo>
   </citation>
 <descript>
   <abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract>
   <purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose>
 </descript>
<timeperd>
 <timeinfo>
  <mdattim>
    <sngdate>
      <caldate>08-24-2009</caldate>
      <time>unknown</time>
     </sngdate>
    <sngdate>
      <caldate>08-26-2009</caldate>
    </sngdate>
   <sngdate>
      <caldate>08-26-2009</caldate>
    </sngdate>
   <sngdate>
      <caldate>07-07-2010</caldate>
    </sngdate>
  </mdattim>
</timeinfo>

This is my Python code so far:

folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009"

for filename in glob.glob(os.path.join(folderPath, "*.xml")):

    fullpath = os.path.join(folderPath, filename)

    if os.path.isfile(fullpath):
        basename, filename2 = os.path.split(fullpath)

        root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\Run_Metadata_2009\\" + filename2)

        iter = root.getiterator()
        #Iterate
        for element in iter:
            print element.tag

            if element.tag == "begdate":
                element.tag.replace("begdate", "sngdate")
3
  • 3
    Also, show us the rules for converting one to the other. I.e. show the input and the expected output generated from that input. Commented Aug 2, 2011 at 22:07
  • The first XML is the input. I have a number of template XML's that have keywords embedded between certain tags. The second is the output that I have edited manually. I want to edit the first XML so that everything between the timeinfo tags in the first XML is replaced by everything between those same tags in the second XML. I am using Python because this is an ArcGIS function and python is the preferred language. I am using this script in conjunction with their python tools. My script is going to be used to batch process XML's to be used as metadata in a large number of GIS shapefiles.... Commented Aug 3, 2011 at 16:51
  • Is this impossible? I've posted this one a couple sites and it doesn't seem like anyone viewing my question can offer a decent answer... Commented Aug 3, 2011 at 17:57

1 Answer 1

1

I believe I succeeded in making the code work. This will allow you to edit certain tags if you need to change them from an existing XML file. I needed to do this to create metadata for some GIS shapefiles in a batch processing script to change certain date values depending on if they were single dates, multiple dates or a range of dates.

This webpage helped a lot: http://lxml.de/tutorial.html

I have some more work to do, but this was the answer I was looking for from my original question :) I'm sure this can be used in many other applications.

# Set workspace location for XML files
folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009"
# Loop through each file and search for files with .xml extension
for filename in glob.glob(os.path.join(folderPath, "*.xml")):

    fullpath = os.path.join(folderPath, filename)

    # Split file name from the directory path
    if os.path.isfile(fullpath):
        basename, filename2 = os.path.split(fullpath)
        # Set variable to XML files
        root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009\\" + filename2)

        # Set variable for iterator
        iter = root.getiterator()
        #Iterate through the tags in each XML file
        for element in iter:
            if element.tag == "timeinfo":
                tree = root.find(".//timeinfo")
                # Clear all tags below the "timeinfo" tag
                tree.clear()
                # Append new Element
                element.append(ET.Element("mdattim"))
                # Create SubElements to the parent tag
                child1 = ET.SubElement(tree, "sngdate")
                child2 = ET.SubElement(child1, "caldate")
                child3 = ET.SubElement(child1, "time")
                # Set text values for tags
                child2.text = "08-24-2009"
                child3.text = "unknown
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.