How do you remove nested keys dynamically from Json using recursion?

Question

I currently have a recursive function that removes ALL keys that match a pattern. Here is the background:

Example Json

{
    "results": [{
        "name": "john doe",
        "age": "100",
        "owned_cars": [{
            "make": "ford",
            "color": "white"
        }, {
            "make": "bmw",
            "color": "red"
        }],
        "wished_cars": [{
            "make": "honda"
        }, {
            "make": "toyota",
            "style": "sleek"
        }, {
            "style": "fat"
        }]
    }]
}

Here's the function:

def remove_all_keys_matching_value(d, keys_to_remove):
    if not isinstance(d, (dict, list)):
        return d
    if isinstance(d, list):
        return [remove_all_keys_matching_value(v, keys_to_remove) for v in d]
    return {k: remove_all_keys_matching_value(v, keys_to_remove) for k, v in d.items() if k not in keys_to_remove}

If I run the function with these keys to remove keys_to_remove = ('make', 'name') I'll get the following result:

{
    "results": [{
        "age": "100",
        "owned_cars": [{
            "color": "white"
        }, {
            "color": "red"
        }],
        "wished_cars": [{}, {
            "style": "sleek"
        }, {
            "style": "fat"
        }]
    }]
}

I want to adjust this code to be more targeted so it doesn't remove all instances of the key but rather takes into account the root value of the key/path if that makes sense.

So for example if I were to pass in a tuple containing (('owned_cars', 'make'), 'name') it would return:

{
    "results": [{
        "age": "100",
        "owned_cars": [{
            "color": "white"
        }, {
            "color": "red"
        }],
        "wished_cars": [{
            "make": "honda"
        }, {
            "make": "toyota",
            "style": "sleek"
        }, {
            "style": "fat"
        }]
    }]
}

I know I need to keep track of the root key somehow but am unsure how to fold this in. I would appreciate any help in solving this. I always struggle when the recursion gets this complex and would love to see how someone more experienced would approach it so I can improve.

While I am interested in the solution to this problem, I'm more interested in learning how to approach a problem like this? I understand whats happening at a high level in the recursive method but struggle when I need to start stepping through it. I don't know how to make the leap to adjusting the code to identify the root path.

"How do you approach difficult recursion problems as a developer?" Can you rename this please to something specific? The question title looks like the usual rubbish that's on the main page but it looks like you have a specific question. You risk getting downvoted from the title alone — roganjosh
– roganjosh, Commented Aug 20, 2022 at 20:03
It would also be helpful if you run your JSON through jsonlint.com (and pasting its output) before posting, or printing it with indentation — roganjosh
– roganjosh, Commented Aug 20, 2022 at 20:04
btw I think a better specification of keys_to_remove would be: (('owned_cars', 'make'), 'name') so that the strings don't have to be split to recover actual key names. — quamrana
– quamrana, Commented Aug 20, 2022 at 20:26
@cauthon for ("owned_cars", "make") would it only remove "make" if the immediate parent is "owned_cars"? Or is it possible that additional levels of data could exist between? — Mulan
– Mulan, Commented Aug 20, 2022 at 21:46

Mulan · Accepted Answer · 2022-08-20 22:28:24Z

1

division of complexity

We could start with a remove function that takes any t and any number of paths -

def remove(t, *paths):
  for p in paths:
    t = remove1(t, p)
  return t

As you can see, it has a simple operation calling remove1(t, p) for all p in the provided paths. The final t is returned. This separates the complexity of removing a single path and removing many paths. We offload the majority of the work to remove1.

remove1

Your original code is pretty close. This remove1 takes any t and a single path.

If the path is empty, return t unmodified
(inductive) the path has at least one element. If t is a list, apply remove1(e, path) for all e of the list t
(inductive) that path has at least one element and t is not a list. If t is a dictionary -
- If the path has only one element, create a new dictionary with k assigned to the result of the sub-problem remove1(v, path) for all k,v of the dictionary t, excluding any k matching the path's element, path[0]
- (inductive) the path has at least two elements. Create a new dictionary with k assigned to the result sub-problem remove1(v, path[1:]) if k matches the first element of that path otherwise assign k to the result of the sub-problem remove1(v, path) for all k,v of the dictionary t.
(inductive) t is a non-list and t is a non-dictionary. Return t unmodified.

def remove1(t, path):
  if not path:
    return t
  elif isinstance(t, list):
    return list(remove1(e, path) for e in t)
  elif isinstance(t, dict):
    if len(path) == 1:
      return {k:remove1(v, path) for (k,v) in t.items() if not k == path[0] }
    else:
      return {k:remove1(v, path[1:]) if k == path[0] else remove1(v, path) for (k,v) in t.items()}
  else:
    return t

modification to the input data

I added another layer to your data so we can see precisely how remove is working -

data = {
  "results": [{
    "name": "john doe",
    "age": "100",
    "owned_cars": [{
      "additional_layer": {  # <-- additional layer
        "make": "foo",
        "color": "green"
      }
    }, {
      "make": "ford",
      "color": "white"
    }, {
      "make": "bmw",
      "color": "red"
    }],
    "wished_cars": [{
      "make": "honda"
    }, {
      "make": "toyota",
      "style": "sleek"
    }, {
      "style": "fat"
    }]
  }]
}

demo

Let's see remove work now -

import json
data = { ... }
new_data = remove(data, ("owned_cars", "make"), ("style",))
print(json.dumps(new_data, indent=2))

This says remove all "make" keys that are any descendant of "owned_cars" keys and remove all "style" keys -

{
  "results": [
    {
      "name": "john doe",
      "age": "100",
      "owned_cars": [
        {
          "additional_layer": {
                                  # <-- make removed
            "color": "green"
          }
        },
        {
                                  # <-- make removed
          "color": "white"
        },
        {
                                  # <-- make removed
          "color": "red"
        }
      ],
      "wished_cars": [
        {
          "make": "honda"      # <-- make not removed
        },
        {
          "make": "toyota"     # <-- make not removed
                               # <-- style removed
        },
        {}                     # <-- style removed
      ]
    }
  ]
}

edited Aug 20, 2022 at 22:28

answered Aug 20, 2022 at 22:22

Mulan

136k35 gold badges240 silver badges276 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mulan Over a year ago

@cauthon, if you want it to target absolute paths, that would be a different function. This code gets you very close to that answer. Can you think of how you would modify it to get that behavior?

cauthon Over a year ago

Thank you! This works great - I will admit I'm a bit jealous at how quickly you were able to break down the problem and solve it. I guess it just comes with time and practice. Do you have any blog or book recommendations to improve this skill?

Mulan Over a year ago

You're very welcome. It definitely takes some practice but you'll catch on sooner than you think. I don't have books on the topic to recommend but I've written many answers on python, recursion and inductive reasoning. If you have any follow up questions, don't be shy ^^

Collectives™ on Stack Overflow

How do you remove nested keys dynamically from Json using recursion?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related