Create nested defaultdict in Python

Question

I'm trying to create a nested dictionary in Python so that, given a list of strings, the dictionary records the number of occurrences of that string order.

For example, if the string list is:

["hey", "my", "name", "is"]

I would want the nested dictionary to look like:

{"hey": {"my": {"name": {"is": 1}}}}

I know I could probably use as key the whole list but I specifically want to separate the strings in the dictionary.

I also would want to approach this problem with a defaultdict dictionary, not the Python ones, and preferably use a recursively defined defaultdict.

This is what I tried:

from collections import defaultdict

nested_dict = lambda: defaultdict(nested_dict)
# Initialize ngrams as a nested defaultdict
ngrams = nested_dict()

# Function to update the nested defaultdict with the list of words
def update_ngrams(ngrams, words):
   current_dict = ngrams
   for word in words[:-1]:
       current_dict = current_dict[word]
   current_dict[words[-1]] += 1

# Example usage
update_ngrams(ngrams, ["My", "big", "cat"])
update_ngrams(ngrams, ["My", "big", "dog"])

but it gives me this error:

TypeError: unsupported operand type(s) for +=: 'collections.defaultdict' and 'int'

The expected output should be a map like this:

{"My": {"big": {"cat": 1, "dog": 1}}}

A defaultdict alone doesn't really work for this, because (as your error suggests) you want the leaves to be integers, not themselves defaultdicts. You can't unconditionally current_dict[words[-1]] += 1. — jonrsharpe
– jonrsharpe, Commented May 14, 2024 at 12:44
Why this approach when much simpler solutions for this exist? 👉stackoverflow.com/a/40401961/476, stackoverflow.com/a/13688108/476 — deceze
– deceze ♦, Commented May 14, 2024 at 12:46
What's the expected output after additionally doing update_ngrams(ngrams, ["My", "big", "cat", "Garfield"])? — no comment
– no comment, Commented May 14, 2024 at 12:52

James · Accepted Answer · 2024-05-14 13:02:45Z

Here is a solution just using the standard dictionary. It creates a new dictionary for every layer except the last one, where it increments an integer.

def update_ngrams(ng: dict, words: list[str]):
    n = len(words)
    d = ng
    for i, w in enumerate(words,1):
        if i == n:
            d.setdefault(w, 0)
            d[w] += 1
        else:
            d = d.setdefault(w, {})

ngrams = {}
update_ngrams(ngrams, ["My", "big", "cat"])
update_ngrams(ngrams, ["My", "big", "dog"])
ngrams
# returns:
{'My': {'big': {'cat': 1, 'dog': 1}}}

The issue you are going to run into is that it does not handle cases where a leaf ('cat' or 'dog') key is also a branch. I.e.: ["My", "big", "dog", "Noodle"].

My suggestion would be to store the leaf occurrence as a dictionary as well with a specific key to indicate it is the leaf count. Here is an example using the underscore as the leaf counter.

def update_ngrams(ng: dict, words: list[str]):
    n = len(words)
    d = ng
    for i, w in enumerate(words,1):
        d = d.setdefault(w, {})
        if i == n:
            d.setdefault('_', 0)
            d['_'] += 1

ngrams = {}
update_ngrams(ngrams, ["My", "big", "cat"])
update_ngrams(ngrams, ["My", "big", "dog"])
update_ngrams(ngrams, ["My", "big", "dog", "Noodle"])
ngrams
# returns:
{'My': {'big': {'cat': {'_': 1}, 'dog': {'_': 1, 'Noodle': {'_': 1}}}}}

Collectives™ on Stack Overflow

Create nested defaultdict in Python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related