0

I have a very large nested dictionary of the form and example:
keyDict = {f: {t: {c_1: None, c_2: None, c_3: None, ..., c_n: None}}}

And another dictionary with keys and values: valDict = {c_1: 13.37, c_2: -42.00, c_3: 0.00, ... c_n: -0.69}

I want to use the valDict to assign the values to the lowest level of the keyDict as fast as possible.

My current implementation is very slow I think because I iterate through the 2 upper levels [f][t] of the keyDict. There must be a way to set the values of the low level without concern for the upper levels because the value of [c] does not depend on the values of [f][t].

My current SLOW implementation:

for f in keyDict:
    for t in keyDict[f]:
        for c in keyDict[f][t]:
            keyDict[f][t][c] = valDict[c]

Still looking for a solution. [c] only has a few thousands keys, but [f][t] can have millions, so the way I do it, distinct value assignment is happening millions of times when it should be able to go through the bottom level and assign the value which does NOT depend on f,t but ONLY on c.

To clarify example per Alexis request: c dictionaries don't necessarily all have the same keys, but c dictionaries DO have the same values for a given key. For example, to make things simple, lets say there are only 3 possible keys for c dict (c_1, c_2, c_3). Now one parent dictionary (ex f=1,t=1) may have just {c_2} and another parent diction (f=1,t=2) may have {c_2 and c_3} and yet another (ex f=999,t=999) might have all three {c_1, c_2, and c_3}. Some parent dicts may have the same set of c's. What I am trying to do is assign the value to the c dict, which is defined purely by the c key, not T or F.

4
  • Any further ideas? This seems like it should be a common issue with an elegant solution Commented Oct 10, 2017 at 12:11
  • If your current code is correct, all of the c dictionaries contain the same keys, or at least all of the keys you are updating. What is the purpose of having millions of identical dictionaries? Updating millions of dictionaries can't be made any faster than the existing answer suggests, but maybe there's a way to avoid it-- if you can explain what you are up to with all these dictionaries. Commented Oct 10, 2017 at 18:37
  • Thanks alexis, replied in post because it was too long. Commented Oct 10, 2017 at 20:56
  • Yup, it sounds like you need one dictionary, not millions of them. Please see my answer. Commented Oct 10, 2017 at 22:46

2 Answers 2

1

If the most nested dicts and valDict share exactly the same keys, it would be faster to use dict.update instead of looping over all the keys of the dict:

for dct in keyDict.values()
    for d in dct.values():
        d.update(valDict)

Also, it is more elegant and probably more faster to loop on the values of the outer dicts directly instead of iterating on the keys and then accessing the value using the current key.

Sign up to request clarification or add additional context in comments.

5 Comments

Not really liking this solution because dict.update rewrites, where I already have dictionary loaded with keys and ready to go. I would like to avoid rewriting those. There must be some way to update values for keys without iterating through each parent dictionary.
What do you mean it "rewrites"? update() only replaces the specified values in the existing dictionary. This is the fastest way to do what you say you need to do-- if you haven't left out any essential parts from your current "SLOW" implementation.
Ok, I will give this try again and update this with results. Thanks
@user3431083 Did you time the code to see if any performance was gained?
Using a medium sized problem, using concepts from these ideas I was able to get that step's runtime from about 30s down to 12s. Not bad, but not great. Was hoping to get it down much lower because when I drop a big boy large size problem on this - it's going be prohibitively slow.
1

So you have millions of "c" dictionaries that you need to keep synchronized. The dictionaries have different sets of keys (presumably for good reason, but I trust you realize that your update code puts the new values in all the dictionaries), but the non-None values must change in lockstep.

You haven't explained what this data structure is for, but judging from your description, you should have a single c dictionary, not millions of them. After all, you only have one set of valid "c" values; maintaining multiple copies is not only a performance problem, it puts an incredible burden of consistency on your code. But obviously, updating a single dictionary will be hugely faster than updating millions of them.

Of course you also want to know which keys were contained in each dictionary: To do this, your tree of dictionaries should terminate with sets of keys, which you can use to look up values as necessary.

In case my description is not clear, here is how your structure would be transformed:

all_c = dict()
for for f in keyDict:
    for t in keyDict[f]:
        all_c.update(k,v for k, v in keydict[f][t].items() if v is not None)
        keydict[f][t] = set(keydict[f][t].keys())

This code builds a combined dictionary all_c with the non-null values from each of your bottom-level "c" dictionaries, then replaces the latter with a list of its keys. If you later need the complete dictionary at keyDict[f][t] (rather than access to particular values), you can reconstruct it like this:

f_t_cdict = dict((k, all_c[k]) for k in keyDict[f][t])

But I'm pretty sure you can do whatever it is you are doing by working with the sets keyDict[f][t], and simply looking up values directly in the combined dictionary all_c.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.