1

I have an XML and part of it looks like this:

        <?xml version="1.0" encoding="UTF-8" ?>,
         <Settings>,
             <System>,
                 <Format>Percent</Format>,
                 <Time>12 Hour Format</Time>,
                 <Set>Standard</Set>,
             </System>,
             <System>,
                 <Format>Percent</Format>,
                 <Time>12 Hour Format</Time>,
                 <Set>Standard</Set>,
                 <Alarm>ON</Alarm>,
                 <Haptic>ON</Haptic>'
             </System>
          </Settings>

What I would like to do is use xpath to specify the path //Settings/System and get the tags and values in system so that I can populate a dataframe with the following output:

| Format | Time| Set| Alarm| Haptic|
|:_______|:____|:___|______|_______|
| Percent| 12 Hour Format| Standard| NaN| NaN|
| Percent| 12 Hour Format| Standard| ON| ON|

So far I have seen methods as follows:

import xml.etree.ElementTree as ET
root = ET.parse(filename)
result = ''

for elem in root.findall('.//child/grandchild'):
    # How to make decisions based on attributes even in 2.6:
    if elem.attrib.get('name') == 'foo':
        result = elem.text

These methods explicitly mention elem.attrib.get('name') which I would not be able to use in my case because of inconsistent elements within my /System tag. So what I am asking is if there is a method to use xpath (or anything else) which I can specify /System and get all elements and their values?

6
  • What do you mean by "inconsistent elements within my /System tag."? Commented Jul 8, 2021 at 21:53
  • @JackFleeting In the example I have three elements (Format, Time, Set), but in other xml files/strings, there may be 5 to 10 different elements Commented Jul 8, 2021 at 22:02
  • I see; and in those situations where you have, say, 5 elements, you are still interested only these specific three? Commented Jul 8, 2021 at 22:04
  • @JackFleeting No - I would want to be able to get all of them and then in a later step of code, concat them Commented Jul 8, 2021 at 22:08
  • 1
    To make sure I understand you, please edit your question with a well-formed xml containing two /System elements with different numbers of child elements (say, 3 in the first and 5 in the second) together with the expected output dataframe. Commented Jul 8, 2021 at 22:11

1 Answer 1

1

Your xml is still not well formed, but assuming it's fixed and looks like the version before, the following should work:

#fixed xml
<?xml version="1.0" encoding="UTF-8" ?>
     <Settings>
         <System>
             <Format>Percent</Format>
             <Time>12 Hour Format</Time>
             <Set>Standard</Set>
         </System>
         <System>
             <Format>Percent</Format>
             <Time>12 Hour Format</Time>
             <Set>Standard</Set>
             <Alarm>ON</Alarm>
             <Haptic>ON</Haptic>
             </System>
     </Settings>

Now for the code itself:

import pandas as pd
rows, tags = [], []
#get all unique element names
for elem in root.findall('System//*'):
    if elem.tag not in tags:
        tags.append(elem.tag)
#now collect the required info:
for elem in root.findall('System'):
    rows.append([elem.find(tag).text if elem.find(tag) is not None else None  for tag in tags ])
pd.DataFrame(rows,columns=tags)

Output:

    Format  Time    Set     Alarm   Haptic
0   Percent     12 Hour Format  Standard    None    None
1   Percent     12 Hour Format  Standard    ON  ON
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.