I am able to use lxml to accomplish most of what I would like to do, although it was a struggle to go through the obfuscating examples and tutorials. In short, I am able to read an external xml file and import it via lxml into the proper tree-like format.
To demonstrate this, if I were to type:
print(etree.tostring(myXmlTree, pretty_print= True, method= "xml") )
I get the following output:
<net xmlns="http://www.arin.net/whoisrws/core/v1" xmlns:ns2="http://www.arin.net/whoisrws/rdns/v1" xmlns:ns3="http://www.arin.net/whoisrws/netref/v2" termsOfUse="https://www.arin.net/whois_tou.html">
<registrationDate>2006-08-29T00:00:00-04:00</registrationDate>
<ref>http://whois.arin.net/rest/net/NET-79-0-0-0-1</ref>
<endAddress>79.255.255.255</endAddress>
<handle>NET-79-0-0-0-1</handle>
<name>79-RIPE</name>
<netBlocks>
<netBlock>
<cidrLength>8</cidrLength>
<endAddress>79.255.255.255</endAddress>
<description>Allocated to RIPE NCC</description>
<type>RN</type>
<startAddress>79.0.0.0</startAddress>
</netBlock>
</netBlocks>
<orgRef name="RIPE Network Coordination Centre" handle="RIPE">http://whois.arin.net/rest/org/RIPE</orgRef>
<comment>
<line number="0">These addresses have been further assigned to users in</line>
<line number="1">the RIPE NCC region. Contact information can be found in</line>
<line number="2">the RIPE database at http://www.ripe.net/whois</line>
</comment>
<startAddress>79.0.0.0</startAddress>
<updateDate>2009-05-18T07:34:02-04:00</updateDate>
<version>4</version>
</net>
OK, that's great for human consumption, but not useful for machines. If I'd wanted particular elements, like say the start and end IP addresses in the xml, I could type:
ns = myXmlTree.nsmap.values()[0]
myXmlTree.findall("{" + ns + "}startAddress")[0].text
myXmlTree.findall("{" + ns + "}endAddress")[0].text
and I would receive:
'79.0.0.0'
'79.255.255.255'
But I still need to LOOK at the xml file as a human to know what elements are there. Instead, I would like to be able to retrieve the names of ALL of the elements at a particular level and then automatically traverse that level. So, for instance, I'd like to do something like:
myElements = myXmlTree.findallelements("{" + ns + "}")
and it would give me a return value something like:
['registrationDate', 'ref', 'endAddress', 'handle', 'name', 'netBlocks', 'orgRef', 'comment', 'startAddress', 'updateDate', 'version']
Especially awesome would be if it could tell me the entire structure of elements, including the nested ones.
I'm SURE there's a way, as it wouldn't make sense otherwise.
Thanks in advance!!
P.S., I know that I can iterate and go through the list of all iterations. I was hoping there was already a method within lxml that had these data. If iteration is the only way, I guess that's OK... it just seems clunky to me.