2

I have a dataset that is structured in a csv like this:

Name,code,count
Adam,01,48
Bill,01,32
Chris,01,4
Carl,01.01,5
Dave,01.01,1
David,01.01,1
Eric,01.01.01,26
Earl,01.01.01.01,2
Frank,01.01.01.01,2
Greg,01.01.01.02,2
Harold,01.01.01.03,7
Ian,01.01.01.03,3
Jack,01.01.01.03,1
John,01.01.01.04,10
Kyle,01.01.01.04,2
Larry,01.01.03.01,3
Mike,01.01.03.01.01,45
Nick,01.01.03.01.01.01,1
Oliver,01.01.03.01.01.02,16
Paul,01.01.03.01.01.03,23

I want to make a dictionary in python where the "name" and the "count" are key:value pairs (which is easy enough), but I want to organize a hierarchy based on the "code" number. i.e 01.01 is a child of 01 and I am not sure how to iterate over the data to make this happen. I eventually want to do a json dump of the whole structure, but it is how to structure the hierarchy that is getting me down. Any help is greatly appreciated.

3
  • Can you provide an example of the desired output? You have three 01 elements at the beginning - how the 01.01 element should fit into the hierarchy? To which element to assign it? Commented May 16, 2016 at 12:49
  • Can you show a snippet of what the "hierarchy" looks like. Commented May 16, 2016 at 12:51
  • What are the up to 5 other fields that appear on some rows of the csv? Commented May 16, 2016 at 14:27

3 Answers 3

2

A simple and elegant way to implement a tree structure in Python uses a recursive defaultdict:

import csv, json
from collections import defaultdict

def tree():
    return defaultdict(tree)

d = tree()

with open('data.txt', 'rb') as f:
    reader = csv.reader(f, delimiter=',')

    for name, code, count in list(reader)[1:]:
        path = code.split('.')
        iter_node = d
        for node in path:
            iter_node = iter_node[node]
        iter_node['values'][name] = count

print json.dumps(d, indent=2)

{
  "01": {
    "values": {
      "Chris": "4", 
      "Bill": "32", 
      "Adam": "48"
    },
    "01": {
      "values": {
        "Dave": "1", 
        "Carl": "5", 
        "David": "1"
      },
      "03": {
        "01": {
          "01": {
            "02": {
              "values": {
                "Oliver": "16"
              }
            }, 
            "03": {
              "values": {
                "Paul": "23"
              }
            }, 
            "01": {
              "values": {
                "Nick": "1"
              }
            }, 
            "values": {
              "Mike": "45"
            }
          }, 
          "values": {
            "Larry": "3"
          }
        }
      }, 
      "01": { 
        "values": {
          "Eric": "26"
        }, 
        "02": {
          "values": {
            "Greg": "2"
          }
        }, 
        "03": {
          "values": {
            "Harold": "7", 
            "Ian": "3", 
            "Jack": "1"
          }
        }, 
        "01": {
          "values": {
            "Earl": "2", 
            "Frank": "2"
          }
        },
        "04": {
          "values": {
            "John": "10", 
            "Kyle": "2"
          }
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

Comments

2

The following snippet finds all the node of a tree without actually creating one. Tree and linked list implementation in Python is inefficient (Beazley).

from itertools import groupby
import csv

with open('csvfile.csv') as f:
    reader = csv.DictReader(f)

groups = groupby(reader, key=lambda row: row['code'])
nodes = {code: {item['Name']: item['count'] for item in group} for code,group in groups}

{'01': {'Adam': '48', 'Bill': '32', 'Chris': '4'},
 '01.01': {'Carl': '5', 'Dave': '1', 'David': '1'},
 '01.01.01': {'Eric': '26'},
 '01.01.01.01': {'Earl': '2', 'Frank': '2'},
 '01.01.01.02': {'Greg': '2'},
 '01.01.01.03': {'Harold': '7', 'Ian': '3', 'Jack': '1'},
 '01.01.01.04': {'John': '10', 'Kyle': '2'},
 '01.01.03.01': {'Larry': '3'},
 '01.01.03.01.01': {'Mike': '45'},
 '01.01.03.01.01.01': {'Nick': '1'},
 '01.01.03.01.01.02': {'Oliver': '16'},
 '01.01.03.01.01.03': {'Paul': '23'}}

9 Comments

Thanks C Panda! This worked great. Can you edit your response to put these lines in: import csv f = open("somefile.csv","r") .....i think some people will need them
@miltonjbradley which lines bdw?
I like the clean code. However, it does not make the tree structure explicit.
@schwobaseggl I am aware of that. As I don't know how the heirarchy looks like. I can't give any semantics to the result dict keys. My python tree will always be a dict. But i want to know how it looks like.
@CPanda I think it will be cleaner if you remove the redundant code from the inner dicts, and just use name:count key-value pairs. That would make the result it much less verbose without losing information.
|
0

I guess you want to do some standard tree structure, where you can access a tree structure, with missing node automatically created when accessing with a path.

Something like this.

class Node:
    def __init__( self, parent=None ):
        self.parent = parent
        self.store = {}
        self.children = {}

    def create_child( self, child_name ):
        self.children[ child_name ] = Node( self )

    #ancestry_line is a list of names
    def recursive_get_child( self, ancestry_line_names ):
        if len(ancestry_line_names) == 0:
            return self
        else:
            next_ancestor = ancestry_line_names[0]
            other_ancestors = ancestry_line_names[1:]
            if next_ancestor not in self.children:
                self.create_child( next_ancestor )
            return self.children[ next_ancestor ].recursive_get_child( other_ancestors )

The all you need to do is create a root node, and access the correct node from it thanks to the path.

root = Node()
for name, code, count in some_data_iterator():
    ancestry_line = code.split(".")
    root.get( ancestry_line ).store[ name ] = count

You can then create a method in Node to convert the Node structure into a pure dictionary structure usable to be dumped in json.

1 Comment

Thanks for your response. This worked for a small list. As soon as I used the whole file I got a "too big to unpack" error

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.