-1

I'm trying to create a nested dictionary in Python so that, given a list of strings, the dictionary records the number of occurrences of that string order.

For example, if the string list is:

["hey", "my", "name", "is"]

I would want the nested dictionary to look like:

{"hey": {"my": {"name": {"is": 1}}}}

I know I could probably use as key the whole list but I specifically want to separate the strings in the dictionary.

I also would want to approach this problem with a defaultdict dictionary, not the Python ones, and preferably use a recursively defined defaultdict.

This is what I tried:

from collections import defaultdict

nested_dict = lambda: defaultdict(nested_dict)
# Initialize ngrams as a nested defaultdict
ngrams = nested_dict()

# Function to update the nested defaultdict with the list of words
def update_ngrams(ngrams, words):
   current_dict = ngrams
   for word in words[:-1]:
       current_dict = current_dict[word]
   current_dict[words[-1]] += 1

# Example usage
update_ngrams(ngrams, ["My", "big", "cat"])
update_ngrams(ngrams, ["My", "big", "dog"])

but it gives me this error:

TypeError: unsupported operand type(s) for +=: 'collections.defaultdict' and 'int'

The expected output should be a map like this:

{"My": {"big": {"cat": 1, "dog": 1}}}
3
  • 1
    A defaultdict alone doesn't really work for this, because (as your error suggests) you want the leaves to be integers, not themselves defaultdicts. You can't unconditionally current_dict[words[-1]] += 1. Commented May 14, 2024 at 12:44
  • 1
    Why this approach when much simpler solutions for this exist? 👉stackoverflow.com/a/40401961/476, stackoverflow.com/a/13688108/476 Commented May 14, 2024 at 12:46
  • 3
    What's the expected output after additionally doing update_ngrams(ngrams, ["My", "big", "cat", "Garfield"])? Commented May 14, 2024 at 12:52

1 Answer 1

1

Here is a solution just using the standard dictionary. It creates a new dictionary for every layer except the last one, where it increments an integer.

def update_ngrams(ng: dict, words: list[str]):
    n = len(words)
    d = ng
    for i, w in enumerate(words,1):
        if i == n:
            d.setdefault(w, 0)
            d[w] += 1
        else:
            d = d.setdefault(w, {})

ngrams = {}
update_ngrams(ngrams, ["My", "big", "cat"])
update_ngrams(ngrams, ["My", "big", "dog"])
ngrams
# returns:
{'My': {'big': {'cat': 1, 'dog': 1}}}

The issue you are going to run into is that it does not handle cases where a leaf ('cat' or 'dog') key is also a branch. I.e.: ["My", "big", "dog", "Noodle"].

My suggestion would be to store the leaf occurrence as a dictionary as well with a specific key to indicate it is the leaf count. Here is an example using the underscore as the leaf counter.

def update_ngrams(ng: dict, words: list[str]):
    n = len(words)
    d = ng
    for i, w in enumerate(words,1):
        d = d.setdefault(w, {})
        if i == n:
            d.setdefault('_', 0)
            d['_'] += 1

ngrams = {}
update_ngrams(ngrams, ["My", "big", "cat"])
update_ngrams(ngrams, ["My", "big", "dog"])
update_ngrams(ngrams, ["My", "big", "dog", "Noodle"])
ngrams
# returns:
{'My': {'big': {'cat': {'_': 1}, 'dog': {'_': 1, 'Noodle': {'_': 1}}}}}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.