0

I have an XML file which has many elements. I would like to create a list/array of all the values which have a specific element name, in my case "pair:ApplicationNumber".

I've gone over a lot of the other questions however I am not able to find an answer. I know that I can do this by loading the text file and going over it using pandas however, I'm sure there's a much better way.

I was unsuccessful trying ElementTree as well as XML.Dom using minidom

My code currently looks as follows:

import os
from xml.dom import minidom
WindowsUser = os.getenv('username')
XMLPath = os.path.join('C:\\Users', WindowsUser, 'Downloads', 'ApplicationsByCustomerNumber.xml')
xmldoc = minidom.parse(XMLPath)
itemlist = xmldoc.getElementsByTagName('pair:ApplicationNumber')
for s in itemlist:
    print(s.attributes['pair:ApplicationNumber'].value)

an example XML file looks as follows:

<?xml version="1.0" encoding="UTF-8"?>
<pair:PatentApplicationList xsi:schemaLocation="urn:us:gov:uspto:pair PatentApplicationList.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pair="urn:us:gov:uspto:pair">
    <pair:FileHeader>
            <pair:FileCreationTimeStamp>2017-07-10T10:52:12.12</pair:FileCreationTimeStamp>
    </pair:FileHeader>
    <pair:ApplicationStatusData>
        <pair:ApplicationNumber>62383607</pair:ApplicationNumber>
        <pair:ApplicationStatusCode>20</pair:ApplicationStatusCode>
        <pair:ApplicationStatusText>Application Dispatched from Preexam, Not Yet Docketed</pair:ApplicationStatusText>
        <pair:ApplicationStatusDate>2016-09-16</pair:ApplicationStatusDate>
        <pair:AttorneyDocketNumber>1354-T-02-US</pair:AttorneyDocketNumber>
        <pair:FilingDate>2016-09-06</pair:FilingDate>
        <pair:LastModifiedTimestamp>2017-05-30T21:40:37.37</pair:LastModifiedTimestamp>
        <pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
            <pair:LastTransactionDate>2017-05-30</pair:LastTransactionDate>
            <pair:LastTransactionDescription>Email Notification</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction> 
        <pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator> 
    </pair:ApplicationStatusData>
    <pair:ApplicationStatusData>
        <pair:ApplicationNumber>62292372</pair:ApplicationNumber>
        <pair:ApplicationStatusCode>160</pair:ApplicationStatusCode>
        <pair:ApplicationStatusText>Abandoned  --  Incomplete Application (Pre-examination)</pair:ApplicationStatusText>
        <pair:ApplicationStatusDate>2016-11-01</pair:ApplicationStatusDate>
        <pair:AttorneyDocketNumber>681-S-23-US</pair:AttorneyDocketNumber>
        <pair:FilingDate>2016-02-08</pair:FilingDate>
        <pair:LastModifiedTimestamp>2017-06-20T21:59:26.26</pair:LastModifiedTimestamp>
        <pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
            <pair:LastTransactionDate>2017-06-20</pair:LastTransactionDate>
            <pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction> 
        <pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator> 
    </pair:ApplicationStatusData>
    <pair:ApplicationStatusData>
        <pair:ApplicationNumber>62289245</pair:ApplicationNumber>
        <pair:ApplicationStatusCode>160</pair:ApplicationStatusCode>
        <pair:ApplicationStatusText>Abandoned  --  Incomplete Application (Pre-examination)</pair:ApplicationStatusText>
        <pair:ApplicationStatusDate>2016-10-26</pair:ApplicationStatusDate>
        <pair:AttorneyDocketNumber>1526-P-01-US</pair:AttorneyDocketNumber>
        <pair:FilingDate>2016-01-31</pair:FilingDate>
        <pair:LastModifiedTimestamp>2017-06-15T21:24:13.13</pair:LastModifiedTimestamp>
        <pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
            <pair:LastTransactionDate>2017-06-15</pair:LastTransactionDate>
            <pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction> 
        <pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator> 
    </pair:ApplicationStatusData>
</pair:PatentApplicationList>

1 Answer 1

1

The XML in your example is expanding the "pair:" part of the tags according to the schema you've used, so it doesn't match 'pair:ApplicationNumber', even though it looks like it should.

I've used element tree to extract the application numbers as follows (I've just used a local XML file in my examples, rather than the full path in your code)

Example 1:

from xml.etree import ElementTree

tree = ElementTree.parse('ApplicationsByCustomerNumber.xml')
root = tree.getroot()

for item in root:
    if 'ApplicationStatusData' in item.tag:
        for child in item:
            if 'ApplicationNumber' in child.tag:
                print child.text

Example 2:

from xml.etree import ElementTree

tree = ElementTree.parse('ApplicationsByCustomerNumber.xml')
root = tree.getroot()

for item in root.iter('{urn:us:gov:uspto:pair}ApplicationStatusData'):
    for child in item.iter('{urn:us:gov:uspto:pair}ApplicationNumber'):
        print child.text

Hope this may be useful.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.