1

I'm trying to extract some data from following XML file.

<?xml version="1.0" encoding="utf-8"?>
<go-home-1:GOHOMEV1 xmlns:go-home-1="https://sample.com/GO-HOME-V1">
    <HOMEV1FileHeader>
        <FileCreationTimestamp>2020-02-15T08:29:22+01:00</FileCreationTimestamp>
        <FileType>AB716</FileType>
        <SGO>YIFG</SGO>
    </HOMEV1FileHeader>
    <OI>
        <ON>YIFG4</ON>
        <CI>HYU</CI>
        <NL>
            <NT>
                <GOCode>HYU34</GOCode>
                <NTName>HYUFFT - 11</NTName>
                <NTData>
                    <RIS>
                        <RI>
                            <EDC>2020-01-18</EDC>
                            <E4NS>
                                <GNS>
                                    <RD>
                                        <NR>
                                            <CC>9012</CC>
                                            <NDC>411</NDC>
                                            <SRng>
                                                <SRngStart>000</SRngStart>
                                                <SRngStop>999</SRngStop>
                                            </SRng>
                                        </NR>
                                    </RD>
                                    <RD>
                                        <NR>
                                            <CC>834</CC>
                                            <NDC>101</NDC>
                                            <SRng>
                                                <SRngStart>150</SRngStart>
                                                <SRngStop>295</SRngStop>
                                            </SRng>
                                        </NR>
                                    </RD>
                                </GNS>
                            </E4NS>
                            <E2NS>
                                <MCC>111</MCC>
                                <MNC>222</MNC>
                            </E2NS>
                            <E2G>
                                <MGT_CC>9012</MGT_CC>
                                <MGT_NC>4113</MGT_NC>
                            </E2G>
                        </RI>
                    </RIS>
                </NTData>
            </NT>
        </NL>
    </OI>
</go-home-1:GOHOMEV1>

My expected output is like below, having SGO as first field.

enter image description here

My attempt is like below (taking ideas from here Getting all children of a node using xml.etree.ElementTree) but I'm getting some errors or empty lists (for sgo = root.find()... and A = root.findall()...) for which I'm stuck. Thanks for any help.

import xml.etree.ElementTree as ET
import glob, os

filename = "file.xml"
namespaces = {
    "go-home-1": "https://sample.com/GO-HOME-V1"
}

root = ET.parse(filename).getroot()

# For this sgo = root.find()... I get ERROR << AttributeError: 'NoneType' object has no attribute 'text'>>
sgo = root.find("go-home-1:HOMEV1FileHeader/"
    "go-home-1:SGO", namespaces).text  

### For below I'm getting empty list A = [] and I don't know why.
A = root.findall(
    "go-home-1:OI/go-home-1:NL/go-home-1:NT[1]/go-home-1:NTData/go-home-1:RIS/go-home-1:RI/go-home-1:E4NS/"
    "go-home-1:GNS/"
    "go-home-1:RD/"
    "go-home-1:NR", namespaces)

for item1 in A:
    Result = [sgo]
    cc = item1.find("go-home-1:CC", namespaces).text
    ndc = item1.find("go-home-1:NDC", namespaces).text
    Result.append(cc)
    Result.append(ndc)
    
    B = item1.findall(
        "go-home-1:OI/go-home-1:NL/go-home-1:NT[1]/go-home-1:NTData/go-home-1:RIS/go-home-1:RI/go-home-1:E4NS/"
        "go-home-1:GNS/"
        "go-home-1:RD/"
        "go-home-1:NR/"
        "go-home-1:SRng", namespaces)
    
    for item2 in B:
    RngStart = item2.find("go-home-1:SRngStart", namespaces).text
    RngStop = item2.find("go-home-1:SRngStop", namespaces).text
    Result.append(RngStart)
    Result.append(RngStop)

    print(Result)
1
  • 1
    The xmlns:go-home-1="https://sample.com/GO-HOME-V1 namespace declaration only applies to the root element. The other elements in the XML document are not bound to a namespace. Commented Jun 28, 2021 at 11:43

1 Answer 1

1

In this particular xml and considering the expected output, namespaces aren't really necessary. Additionally, the best way, I think, to present your output is using a dataframe.

import pandas as pd

columns = ['SGO', 'MCC','MNC','MGT_CC','MGT_NC','CC','NDC','SRngStart','SRngStop']

sgo = root.find('.//SGO').text
mcc = root.find('.//MCC').text
mnc = root.find('.//MNC').text
mgt_cc = root.find('.//MGT_CC').text
mgt_nc = root.find('.//MGT_NC').text

rows = []
for entry in root.findall('.//RD'):
    row = []
    cc = entry.find('.//CC').text
    ndc = entry.find('.//NDC').text
    srngstart = entry.find('.//SRngStart').text
    srngstop = entry.find('.//SRngStop').text
    row.extend([sgo,mcc,mnc,mgt_cc,mgt_nc,cc,ndc,srngstart,srngstop])
    rows.append(row)

df = pd.DataFrame(rows, columns=columns)
df

Output:

SGO     MCC     MNC     MGT_CC  MGT_NC  CC  NDC     SRngStart   SRngStop
0   YIFG    111     222     9012    4113    9012    411     000     999
1   YIFG    111     222     9012    4113    834     101     150     295
Sign up to request clarification or add additional context in comments.

2 Comments

Excellent. It seems to work just fine for this sample. I have a question. Since there are CC, NDC children with different parents (for example E4NS is parent here, but could be a parent RHJ with children CC and NDC. To get those should I use larger xpath like .//E4NS/GNS/RD/NR/CC and .//RHJ/GNS/RD/NR/CC? because I've tried in that way and doesn't work. Thanks
@Suspeg It's possible, but I would need to a representative sample of the larger xml to make sure. You should post it as a new question, BTW; I'll be happy to take a look.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.