Parsing an XML file using Element Tree

Question

I have a large number of .xml files (about 70) and i need to extract some co-ordinates from them. Apparently the best way to do this is to parse the xml file using element tree. I am new to python (very very new!) and am having a difficult time understanding all of the documentation which comes with element tree! I was wondering if anyone had any code where they have used element tree or if anyone could explain to me how to go about it.. Thank you!

This is a sample from my XML file..

    <?xml version="1.0" encoding="UTF-8" ?> 
- <lev:Leveringsinformatie xmlns:lev="http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo">
  <lev:Version>1.5</lev:Version> 
  <lev:Klicnummer>10G179900</lev:Klicnummer> 
  <lev:Ordernummer>0065491624</lev:Ordernummer> 
  <lev:RelatienummerGrondroerder>0000305605</lev:RelatienummerGrondroerder> 
  <lev:Leveringsvolgnummer>1</lev:Leveringsvolgnummer> 
  <lev:Meldingsoort>Graafmelding</lev:Meldingsoort> 
  <lev:DatumTijdAanvraag>2010-08-10T11:43:02.779+02:00</lev:DatumTijdAanvraag> 
  <lev:KlantReferentie>1207-0132-030 - 6</lev:KlantReferentie> 
- <lev:Locatie axisLabels="x y" srsDimension="2" srsName="epsg:28992" uomLabels="m m">
- <gml:exterior xmlns:gml="http://www.opengis.net/gml">
- <gml:LinearRing>
  <gml:posList>137800.0 484217.0 137796.0 484222.0 137832.0 483757.0 138178.0 483752.0 138174.0 484222.0 137800.0 484217.0</gml:posList> 
  </gml:LinearRing>
  </gml:exterior>
  </lev:Locatie>
- <lev:Pngformaat>
- <lev:OmsluitendeRechthoek xmlns:ns4="http://www.kadaster.nl/schemas/klic/20080722/madt" xmlns:bis="http://www.kadaster.nl/schemas/klic/20080722/klicnetbeheerdersinformatieservicetypes" xmlns:ns0="http://www.kadaster.nl/schemas/klic/20080722/gias" xmlns:ns7="http://www.kadaster.nl/schemas/klic/20080722/klicnetbeheerdersinformatieservicetypes" xmlns:madt="http://www.kadaster.nl/schemas/klic/20080722/madt" xmlns:gia="http://www.kadaster.nl/schemas/klic/20080722/gias" xmlns:klic="http://www.kadaster.nl/schemas/20080722/klic" xmlns:b="http://www.kadaster.nl/schemas/klic/20080722/bundelingtypes" xmlns:ns9="http://www.kadaster.nl/schemas/klic/20081010/bmkltypes" xmlns:gml="http://www.opengis.net/gml" xmlns:ns1="http://www.kadaster.nl/schemas/20080722/klic" xmlns:a="http://www.kadaster.nl/schemas/klic/20080722/bundelingservicetypes" xmlns:bmkl="http://www.kadaster.nl/schemas/klic/20081010/bmkltypes" xmlns:ns3="http://www.opengis.net/gml" xmlns:ns8="http://www.kadaster.nl/schemas/klic/20080722/knts">
- <gml:Envelope srsDimension="2" srsName="epsg:28992">
  <gml:lowerCorner>137796 483752</gml:lowerCorner> 
  <gml:upperCorner>138178 484222</gml:upperCorner> 
  </gml:Envelope>
  </lev:OmsluitendeRechthoek>
  <lev:PixelsBreed>5348</lev:PixelsBreed> 
  <lev:PixelsHoog>6580</lev:PixelsHoog> 
  </lev:Pngformaat>
- <lev:NetbeheerderLeveringen>
- <lev:NetbeheerderLevering>
  <lev:RelatienummerNetbeheerder>0000578695</lev:RelatienummerNetbeheerder> 
  <lev:Bedrijfsnaam>Gemeente Almere</lev:Bedrijfsnaam> 
  <lev:BedrijfsnaamAfkorting>Gemeente Almere</lev:BedrijfsnaamAfkorting>

I need to extract the lower and upper corner co-ordinates (lowerCorner/upperCorner)

Update: Here is my full script:

from xml.etree import ElementTree as ET
import sys, string, os, arcgisscripting
gp = arcgisscripting.create(9.3)

workspace = "D:/J040083"
gp.workspace = workspace

for root, dirs, filenames in os.walk(workspace): # returms root, dirs, and files
    for filename in filenames:
        filename_split = os.path.splitext(filename) # filename and extensionname (extension in [1])
        filename_zero = filename_split[0]
        extension = str.upper(filename_split[1])

        try:
            first_2_letters = str.upper(filename_zero[0] + filename_zero[1])
        except:
            first_2_letters = "XX"

        if first_2_letters == "LI" and extension == ".XML":
            tree = ET.parse(workspace)
            print tree.find('//{http://www.opengis.net/gml}lowerCorner').text
            print tree.find('//{http://www.opengis.net/gml}upperCorner').text

I am now getting the error:

Message File Name Line Position
Traceback
D:\J040083\TXT_EXTRACTION.py 32
parse C:\Python25\Lib\xml\etree\ElementTree.py 862
parse C:\Python25\Lib\xml\etree\ElementTree.py 579
IOError: [Errno 13] Permission denied: 'D:/J040083'

and now i am REALLY confused because i am able to access these files with a different script which is almost exactly the same as this one!!

Just so we're all on the same page, have you read the ElementTree documentation? That's a reference document but there are examples sprinkled throughout the page. For an intro, the ElementTree Overview page might be helpful too. — Greg Hewgill
– Greg Hewgill, Commented Jan 18, 2011 at 10:12
Embarrassingly yes i have read it! I just don't really understand it.. — Alice Duff
– Alice Duff, Commented Jan 18, 2011 at 10:34
@Alice: I suggest you post a small realistic snippet from an XML file you want to parse and specify the data you want to reach. You can do it by editing your own post. — Eli Bendersky
– Eli Bendersky, Commented Jan 18, 2011 at 10:52
I did try that but it just shows up in my question not in the correct format.. so instead of having the comments it just had the numbers! — Alice Duff
– Alice Duff, Commented Jan 18, 2011 at 10:57
@Alice Duff - if you're going to be doing a lot of work with GML then I'd recommend reading up on XML. GML can get fairly complex and you'll be pleased you got the XML fundamentals sorted out. I can't recommend any tutorials as it's been a while since I've looked at them, but avoid W3Schools (NOT linked with W3, who actually write the spec!) as they're frequently inaccurate. This is the first result that isn't W3Schools: learn-xml-tutorial.com — James Walford
– James Walford, Commented Jan 18, 2011 at 11:42

Mark Tolonen · Accepted Answer · 2011-01-18 16:20:50Z

ElementTree can be tricky when namespaces are involved. The element you are looking for are named <gml:lowerCorner> and <gml:upperCorner>. Searching higher in the XML data, gml is defined as an XML namespace: xmlns:gml="http://www.opengis.net/gml". The way to find a subelement of the XML tree is as follows:

from xml.etree import ElementTree as ET
tree = ET.parse('file.xml')
print tree.find('//{http://www.opengis.net/gml}lowerCorner').text
print tree.find('//{http://www.opengis.net/gml}upperCorner').text

Output

137796 483752
138178 484222

Explanation

Using ElementTree's XPath support, // selects all subelements on all levels of the tree. ElementTree uses {url}tag notation for a tag in a specific namespace. gml's URL is http://www.opengis.net/gml. .text retrieves the data in the element.

Note that // is a shortcut to finding a nested node. The full path of upperCorner in ElementTree's syntax is actually:

{http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo}Pngformaat/{http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo}OmsluitendeRechthoek/{http://www.opengis.net/gml}Envelope/{http://www.opengis.net/gml}upperCorner

CharlesB · Accepted Answer · 2011-01-18 16:12:54Z

2

Using ElementTree is very simple, basically you create an object parsed from a file, find elements by name or path, and get their text or attribute.

In your case it's a bit more complicated because you have namespaces in your file, so we have to transform the path from the form ns:tag to the form {uri}tag. This the aim of the transform_path function

NS_MAP = {
    'http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo' : 'lev',
    'http://www.opengis.net/gml' : 'gml',
}
INV_NS_MAP = {v:k for k, v in NS_MAP.items()} #inverse ns_map 
#for python2: INV_NS_MAP = dict((v,k) for k, v in NS_MAP.iteritems())

#ElementTree expect tags in form {uri}tag, but it would be a pain to have complete uri for eache tag
def transform_path (path):
    res = ''
    tags = path.split('/')
    for tag in tags:
      ns, tag = tag.split(':')
      res += "{"+INV_NS_MAP[ns]+"}"+tag+'/'
    return res

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
doc = tree.getroot()

lowerCorner = doc.find(transform_path("lev:Pngformaat/lev:OmsluitendeRechthoek/gml:Envelope/gml:lowerCorner"))
upperCorner = doc.find(transform_path("lev:Pngformaat/lev:OmsluitendeRechthoek/gml:Envelope/gml:upperCorner"))
print (lowerCorner.text)         # Print coordinates
print (upperCorner.text)         # Print coordinates

#for python2: print elem.text

Running the script with you file will give the following output:

137796 483752
138178 484222

edited Jan 18, 2011 at 16:12

answered Jan 18, 2011 at 10:31

CharlesB

91.2k29 gold badges203 silver badges228 bronze badges

4 Comments

Alice Duff Over a year ago

Thanks Charles, I am trying to run your code but it keeps giving me the error "Invalid Syntax" for the final line!

Alice Duff Over a year ago

im having some trouble making this script work.. Now i get an "Invalid Syntax" error for the second from last line..?

Alice Duff Over a year ago

I think it should work i just dont understand how to make it work with my data - i will try doing some research and hopefully i will understand!!

CharlesB Over a year ago

I made a small script that reads the coordinates of your file

Collectives™ on Stack Overflow

Parsing an XML file using Element Tree

2 Answers 2

Output

Explanation

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Output

Explanation

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related