1

Need some help with this. We have a large XML files from an API that I need to convert into a SQL Server table. Using pyodbc to load the table into SQL is no problem but I'm struggling to parse the 16mb XML file. The file is well formed - it is wrapped inside a single root. I've been looking for a solution for weeks and haven't been able to trial-and-error my way through it. Any assistance is appreciated.

In short, the goal is to output all values contained in the XML file in tabular format where each record is listed in its' own row. The Address and Space sections of the XML require proper handling for the sub elements. I've got code that works but I don't think this is the "proper" way to parse the values:

import xml.etree.ElementTree as ET
tree = ET.parse('Z:\\FileLoc\\DealData.xml')
root = tree.getroot()
    for prop in deals[0]:
    for addr1 in prop[4].iter(tag="Street"):
        for addr2 in prop[4].iter(tag="City"):
            for addr3 in prop[4].iter(tag="State"):
                for addr4 in prop[4].iter(tag="PostalCode"):
                    for addr5 in prop[4].iter(tag="Country"):
                        for spaces in prop[5]:
                                print(deals.attrib.get("DealID"), prop.attrib.get("ID"),
                                   prop[0].text[0:5], prop[0].text[6:],
                                   addr1.text, addr2.text, addr3.text, addr4.text, addr5.text,
                                   spaces.attrib.get("ID"))

Here's one example of the XML I'm working with:

    <?xml version="1.0" encoding="UTF-8"?>
<DealExport>
    <jobid>8558e207709c0730</jobid>
    <Deals>
        <Deal DealID="706880">
            <Properties>
                <Property ID="69320">
                    <Name>41381_BuildingName - Bldg 1</Name>
                    <CreatedAt>2015-10-28T04:09:07Z</CreatedAt>
                    <UpdatedAt>2017-12-20T01:09:51Z</UpdatedAt>
                    <Symbols>
                        <Type>SourceID</Type>
                        <Value>41381~41381</Value>
                    </Symbols>
                    <Address>
                        <City>Calgary</City>
                        <State>AB</State>
                        <Street>1234 Five Street</Street>
                        <PostalCode>H0H 0H0</PostalCode>
                        <Country>Canada</Country>
                    </Address>
                    <Spaces>
                        <Space ID="848294">
                            <Suite>800</Suite>
                            <SpaceAvailable>11225</SpaceAvailable>
                            <FloorName>8</FloorName>
                            <FloorPosition>7</FloorPosition>
                            <FloorRentableArea/>
                            <CreatedAt>2016-04-20T13:47:13Z</CreatedAt>
                            <UpdatedAt>2017-10-02T09:17:20Z</UpdatedAt>
                            <Symbols>
                                <Type>SourceID</Type>
                                <Value>800</Value>
                            </Symbols>
                        </Space>
                    </Spaces>
                </Property>
            </Properties>
            <DealType>new</DealType>
            <LastModified>2017-12-20T22:36:21Z</LastModified>
            <CreatedDate>2017-06-06T15:09:35Z</CreatedDate>
            <Stage>dead_deal</Stage>
            <Probability>0</Probability>
            <CompetitiveSet/>
            <MoveInDate>2017-12-01</MoveInDate>
            <Tenant>
                <Name>Current Occupant Ltd.</Name>
                <Industry>financial services</Industry>
                <Affiliation/>
                <TenantID>1013325</TenantID>
                <TenantWebsite/>
                <TenantCurrentLocation/>
                <TenantRSFFrom>6000</TenantRSFFrom>
                <TenantRSFTo>7000</TenantRSFTo>
            </Tenant>
            <Brokers></Brokers>
            <TenantContacts></TenantContacts>
            <DealStages>
                <DealStage ID="1533265">
                    <Stage>proposal</Stage>
                    <CreatedDate>2017-06-06T15:09:35Z</CreatedDate>
                    <LastModified>2017-06-06T15:09:22Z</LastModified>
                    <StartDate>2017-06-06T15:09:22Z</StartDate>
                    <DurationInDays>197</DurationInDays>
                    <DeadDealReasons></DeadDealReasons>
                    <User>
                        <UserId>17699</UserId>
                        <FirstName>Person</FirstName>
                        <LastName>Last-Name</LastName>
                        <Email>[email protected]</Email>
                        <Phone></Phone>
                        <PhoneExtension></PhoneExtension>
                        <Title></Title>
                        <BrokerLicenseNumber></BrokerLicenseNumber>
                    </User>
                </DealStage>
                <DealStage ID="2174388">
                    <Stage>dead_deal</Stage>
                    <CreatedDate>2017-12-20T22:36:01Z</CreatedDate>
                    <LastModified>2017-12-20T22:36:01Z</LastModified>
                    <StartDate>2017-12-20T22:36:01Z</StartDate>
                    <DurationInDays>0</DurationInDays>
                    <DeadDealReasons></DeadDealReasons>
                    <User>
                        <UserId>24934</UserId>
                        <FirstName>Agent</FirstName>
                        <LastName>Lastname</LastName>
                        <Email>[email protected]</Email>
                        <Phone></Phone>
                        <PhoneExtension></PhoneExtension>
                        <Title></Title>
                        <BrokerLicenseNumber></BrokerLicenseNumber>
                    </User>
                </DealStage>
                <DealStage ID="2174390">
                    <Stage>dead_deal</Stage>
                    <CreatedDate>2017-12-20T22:36:21Z</CreatedDate>
                    <LastModified>2017-12-20T22:36:21Z</LastModified>
                    <StartDate>2017-12-20T22:36:21Z</StartDate>
                    <DurationInDays>99</DurationInDays>
                    <DeadDealReasons>
                        <DeadDealReason>price</DeadDealReason>
                    </DeadDealReasons>
                    <User>
                        <UserId>24934</UserId>
                        <FirstName>Agent</FirstName>
                        <LastName>Lastname</LastName>
                        <Email>[email protected]</Email>
                        <Phone></Phone>
                        <PhoneExtension></PhoneExtension>
                        <Title></Title>
                        <BrokerLicenseNumber></BrokerLicenseNumber>
                    </User>
                </DealStage>
            </DealStages>
            <DealComments></DealComments>
            <DealTerms>
                <DealTerm ID="4278580">
                    <ProposalType>landlord</ProposalType>
                    <DiscountRate>0.08</DiscountRate>
                    <RentableArea>6200</RentableArea>
                    <LeaseType>triple net</LeaseType>
                    <LeaseTerm>60</LeaseTerm>
                    <CommencementDate>2018-01-01</CommencementDate>
                    <TenantImprovements></TenantImprovements>
                    <BuildingImprovements></BuildingImprovements>
                    <FreeRents></FreeRents>
                    <CreatedDate>2017-06-06T15:16:23Z</CreatedDate>
                    <SecurityDeposit/>
                    <NER>20714.1144670763</NER>
                    <NEROverride/>
                    <DateEntered>2017-06-06</DateEntered>
                    <OpEx>
                        <BaseOpEx/>
                        <BaseYear/>
                        <YearType/>
                        <LeaseType/>
                        <Description/>
                    </OpEx>
                    <RealEstateTaxes>
                        <BaseRealEstateTaxes>7.45</BaseRealEstateTaxes>
                        <BaseYear/>
                        <YearType/>
                        <LeaseType/>
                        <Description/>
                    </RealEstateTaxes>
                    <NPVperSqFt>103367.15936951</NPVperSqFt>
                    <TotalNPV>640876388.09096</TotalNPV>
                    <MiscDescription/>
                    <NetCashFlow>642137720.0</NetCashFlow>
                    <TotalIncome>502200.0</TotalIncome>
                    <Concessions>64480.0</Concessions>
                    <Payback>1</Payback>
                    <IRR/>
                    <Latest>true</Latest>
                    <BaseRents>
                        <BaseRent ID="45937180">
                            <BeginIn>1</BeginIn>
                            <Rent>15.0</Rent>
                            <Period>rsf/year</Period>
                            <Duration>36</Duration>
                        </BaseRent>
                        <BaseRent ID="45937181">
                            <BeginIn>37</BeginIn>
                            <Rent>18.0</Rent>
                            <Period>rsf/year</Period>
                            <Duration>24</Duration>
                        </BaseRent>
                    </BaseRents>
                    <RentEscalations></RentEscalations>
                    <OtherCredits></OtherCredits>
                    <Commissions>
                        <Commission ID="260593">
                            <RepType>landlord</RepType>
                            <Firm>LandLord</Firm>
                            <Amount>2.9</Amount>
                            <Unit>rsf</Unit>
                        </Commission>
                        <Commission ID="260594">
                            <RepType>landlord</RepType>
                            <Firm>Payee</Firm>
                            <Amount>7.5</Amount>
                            <Unit>rsf</Unit>
                        </Commission>
                    </Commissions>
                    <Rights></Rights>
                    <ProposalID>186607</ProposalID>
                    <ProposalEnteredDate>2017-06-06</ProposalEnteredDate>
                    <RecoveryIncomes>
                        <RecoveryIncome>
                            <RecoveryType>ReimbursableExpenseCam</RecoveryType>
                            <ExpenseAmount>12.87</ExpenseAmount>
                            <RecoveryMethod>net</RecoveryMethod>
                            <RecoveryAmount>12.87</RecoveryAmount>
                        </RecoveryIncome>
                        <RecoveryIncome>
                            <RecoveryType>RealEstateTax</RecoveryType>
                            <ExpenseAmount>7.45</ExpenseAmount>
                            <RecoveryMethod>net</RecoveryMethod>
                            <RecoveryAmount>7.45</RecoveryAmount>
                        </RecoveryIncome>
                    </RecoveryIncomes>
                </DealTerm>
            </DealTerms>
            <Budgets></Budgets>
            <Appraisals></Appraisals>
            <Tours></Tours>
        </Deal>
3
  • 1
    What do you need parsed? What is the structure of table? What did you try? Commented Apr 1, 2018 at 12:55
  • 1
    16Mb is not really that large. there's no point telling us you are struggling: you need to tell us what you are doing and what the failure symptoms are. Commented Apr 1, 2018 at 16:34
  • parsing a 16MB XML file shouldn't take that long. printing a 16MB XML file may take a while. change that print to pass and you will get a better idea of the time for the parsing alone. using integer offsets (prop[0]) is less accurate than looking by tag (prop.findall('Address')), but otherwise it seems fine. Commented Apr 2, 2018 at 15:45

1 Answer 1

4

For the best of my knowledge, there is no problem working with large files in XML. You can use xml.etree.ElementTree for your data. I tested the following code and it works (parsing and printing):

import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()

def deepParsing(node, pre=None):
    if pre is None:
        pre = '-'
    print(' ')
    print(pre + '> Tag:',node.tag)
    print(pre + '> Attributes:', node.attrib)
    print(pre + '> Text:', node.text)

    pre += '-'
    for key in node:
        deepParsing(key, pre)
deepParsing(root)

Your data was printed on console according to the given tree.

Your example is missing closing tags:

    </Deals>
</DealExport>

at the end of the xml.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.