Parsing XML into a dictionary of lists Python/Django

Question

I'm having a little issue with parsing an xml with python. I'm trying to get my dictionary to look like the following

listDict = [{'name':'Sales','id':'1','position':'1','order_by_type':'True','order_by_asc':'True;}, {'name':'Information','id':'2','position':'1','order_by_type':'True','order_by_asc':'True;}]

I'm thinking my loop after pulling data from the xml string is wrong.

        xml_data = ElementTree.fromstring(self.data)
    # Lets grab all the base cats info and add them to a dict containing a list
    base_cats = xml_data.findall('./BaseCategory/Name')
    base_cats_id = xml_data.findall('./BaseCategory/base_id')
    base_postion = xml_data.findall('./BaseCategory/position')
    base_order_by_type = xml_data.findall('./BaseCategory/order_by_type')
    base_order_by_asc = xml_data.findall('./BaseCategory/order_by_asc')

    # store all information into lists
    base_cat = [t.text for t in base_cats]
    base_id = [t.text for t in base_cats_id]
    base_p = [t.text for t in base_postion]
    base_obt = [t.text for t in base_order_by_type]
    base_asc = [t.text for t in base_order_by_asc]

    base_dict = defaultdict(list)
    # lets put everything in the list into a dictionary
    for base in range(len(base_cat)):  # for each base in base_cat loop
        base_dict[base].append(base_cat[base])
        base_dict[base].append(base_id[base])
        base_dict[base].append(base_p[base])
        base_dict[base].append(base_obt[base])
        base_dict[base].append(base_asc[base])

This produces the following.

instance = {0: ['Sales 2', '1', '10', 'True', 'True'], 1: ['Information 2', '2', '20', 'True', 'True'], 2: ['Listing 2', '3', '30', 'True', 'True'], 3: ['Information', '4', '40', 'True', 'True'], 4: ['Land', '5', '50', 'True', 'True'], 5: ['&', '6', '60', 'True', 'True'], 6: ['Tax', '7', '70', 'True', 'True'], 7: ['Construction', '9', '90', 'True', 'True'], 8: ['Interior/Utilites', '10', '100', 'True', 'True'], 9: ['HOA/Community', '11', '110', 'True', 'True'], 10: ['Remarks', '12', '120', 'True', 'True'], 11: ['Exterior', '8', '80', 'True', 'True']})

My end goal is to be able to do the following on my django template

{%for item in instance%}
{{ item.name }}
{% endfor %}

Any help on how I may have something wrong would help a lot. Thanks in advance for the help.

EDIT: As asked here is the xml I have.

    <?xml version="1.0" ?>
<FormInstance>
    <BaseCategory>
        <Name>Sales</Name>
        <base_id>1</base_id>
        <position>10</position>
        <order_by_type>True</order_by_type>
        <order_by_asc>True</order_by_asc>
    </BaseCategory>
    <BaseCategory>
        <Name>Information</Name>
        <base_id>2</base_id>
        <position>20</position>
        <order_by_type>True</order_by_type>
        <order_by_asc>True</order_by_asc>
        <MainCategory>
            <main_id>1</main_id>
            <Name>Address 3</Name>
            <is_visible>True</is_visible>
            <position>10</position>
            <order_by_type>True</order_by_type>
            <order_by_asc>True</order_by_asc>
            <SubCategory>
                <sub_id>1</sub_id>
                <Name>Street Number 2</Name>
                <sub_library_id>StreetNumber</sub_library_id>
                <field_display_type>[u'input']</field_display_type>
                <field_type>[u'varchar']</field_type>
                <is_active>True</is_active>
                <is_required>True</is_required>
                <help_text>Street Number</help_text>
                <main_category>1</main_category>
                <is_visible>True</is_visible>
                <position>10</position>
                <order_by_type>True</order_by_type>
                <order_by_asc>True</order_by_asc>
                <show_seller>True</show_seller>
                <Enumerations>
                    <enum_id>4</enum_id>
                    <Name>Test Enum</Name>
                    <library_id>test enum</library_id>
                    <is_active>True</is_active>
                    <sub_category>1</sub_category>
                    <is_visible>True</is_visible>
                    <position>10</position>
                    <order_by_type>True</order_by_type>
                    <order_by_asc>True</order_by_asc>
                </Enumerations>
            </SubCategory>
        </MainCategory>
    </BaseCategory>
</FormInstance>

How does the XML look like? For what I gathered, it looks like you'd be better off iterating over the BaseCategory items and getting its subnodes? — Savir
– Savir, Commented Nov 3, 2016 at 17:48
@BorrajaX I've added in the xml. Sorry for not adding it in the original post. — jefpadfi
– jefpadfi, Commented Nov 3, 2016 at 18:02

Savir · Accepted Answer · 2016-11-03 18:51:26Z

So, for what I gather in the expected results, it looks like you just want to get the information about nodes that are strictly BaseCategory, right? In the XML that was provided in the edit, you have two of those.

You should see the XML as a tree of nodes. In the example, you have something like:

                     FormInstance  # this is the root
                      /         \
                     /           \
             BaseCategory       BaseCategory
             (name:Sales)    (name:Information)
                                    \
                                     \
                                  MainCategory
                                (name:Address 3)
                                        \
                                         \
                                      Subcategory
                                  (name:Street Number 2)

But you only need the information in the BaseCategory elements, right?

You could just position yourself in the root (which... well... is what xml.fromstring does anyway) iterate over its BaseCategory nodes, get the items you need from those BaseCategory nodes and put them in your list of dictionaries.

Something like:

import pprint
from xml.etree import ElementTree

with open("sample_xml.xml", 'r') as f:
    data = f.read()
    xml_data = ElementTree.fromstring(data)

base_categories = xml_data.findall("./BaseCategory")
print("Found %s base_categories." % len(base_categories))
list_dict = []
for base_category in base_categories:
    list_dict.append({
        "name": base_category.find("Name").text,
        "id": int(base_category.find("base_id").text),
        "position": int(base_category.find("position").text),
        "order_by_type": (True if base_category.find("order_by_type").text.lower() == "true"
                          else False),
        "order_by_asc": (True if base_category.find("order_by_asc").text.lower() == "true"
                         else False),
    })

print("list_dict=%s" % (pprint.pformat(list_dict)))

Which outputs:

Found 2 base_categories.
list_dict=[{'id': 1,
  'name': 'Sales',
  'order_by_asc': True,
  'order_by_type': True,
  'position': 10},
 {'id': 2,
  'name': 'Information',
  'order_by_asc': True,
  'order_by_type': True,
  'position': 20}]

The idea is that a BaseCategory item is something that can be seen as a self-contained record (like a dict, if it helps you see it) that can contain (in it) the following attributes:

A string with the name in Name
A numeric id in base_id
A numeric position
A boolean order_by_type
A boolean order_by_asc
Another object MainCategory with its own fields...

So every time you position yourself in one of these BaseCategory nodes, you just gather the interesting fields that it has and put them in dictionaries.

When you do:

base_cats = xml_data.findall('./BaseCategory/Name')
base_cats_id = xml_data.findall('./BaseCategory/base_id')
base_postion = xml_data.findall('./BaseCategory/position')
base_order_by_type = xml_data.findall('./BaseCategory/order_by_type')
base_order_by_asc = xml_data.findall('./BaseCategory/order_by_asc')

You are treating those element (base_id, position...) almost as independent elements, which is not exactly what you have in your XML.

However, if you are absolutely certain that all those lists (base_cats, base_cats_id, base_position...) do contain the same number of items, you can still re-build your dictionary, using the lenght of one of them (in the example below len(base_cats), but it could've been len(base_cats_id), len(base_position)... since all those lists have the same length) to iterate through all the lists in the same step:

base_cats = xml_data.findall('./BaseCategory/Name')
base_cats_id = xml_data.findall('./BaseCategory/base_id')
base_postion = xml_data.findall('./BaseCategory/position')
base_order_by_type = xml_data.findall('./BaseCategory/order_by_type')
base_order_by_asc = xml_data.findall('./BaseCategory/order_by_asc')

list_dict = []
for i in range(len(base_cats)):
    list_dict.append({
        "name": base_cats[i].text,
        "id": int(base_cats_id[i].text),
        "position": int(base_postion[i].text),
        "order_by_type": True if base_order_by_type[i].text.lower() == "true" else False,
        "order_by_asc": True if base_order_by_asc[i].text.lower() == "true" else False,
    })
print("list_dict=%s" % (pprint.pformat(list_dict)))

Thanks for the help BorrajaX. This helps me a lot and able to understand it better. Eventually I will need Main, Sub and Enum. Though with your answer, I believe I'll be able to adapt it to my needs. Thanks a lot. Really appreciate it.
When you have to do that, you just need to see that your base category also contains a dict (or "object") MainCategory, which has a bunch of "fields" (name, base_id...) which can subsequently contain a SubCategory field that is a dict (or "object") with another bunch of "fields"... and so on :-) Have fun coding!
Thanks again BorrajaX. I've been stuck on this for two days and finally gave in to ask. Appreciate your help!

Collectives™ on Stack Overflow

Parsing XML into a dictionary of lists Python/Django

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related