0

I currently have a recursive function that removes ALL keys that match a pattern. Here is the background:

Example Json

{
    "results": [{
        "name": "john doe",
        "age": "100",
        "owned_cars": [{
            "make": "ford",
            "color": "white"
        }, {
            "make": "bmw",
            "color": "red"
        }],
        "wished_cars": [{
            "make": "honda"
        }, {
            "make": "toyota",
            "style": "sleek"
        }, {
            "style": "fat"
        }]
    }]
}

Here's the function:

def remove_all_keys_matching_value(d, keys_to_remove):
    if not isinstance(d, (dict, list)):
        return d
    if isinstance(d, list):
        return [remove_all_keys_matching_value(v, keys_to_remove) for v in d]
    return {k: remove_all_keys_matching_value(v, keys_to_remove) for k, v in d.items() if k not in keys_to_remove}

If I run the function with these keys to remove keys_to_remove = ('make', 'name') I'll get the following result:

{
    "results": [{
        "age": "100",
        "owned_cars": [{
            "color": "white"
        }, {
            "color": "red"
        }],
        "wished_cars": [{}, {
            "style": "sleek"
        }, {
            "style": "fat"
        }]
    }]
}

I want to adjust this code to be more targeted so it doesn't remove all instances of the key but rather takes into account the root value of the key/path if that makes sense.

So for example if I were to pass in a tuple containing (('owned_cars', 'make'), 'name') it would return:

{
    "results": [{
        "age": "100",
        "owned_cars": [{
            "color": "white"
        }, {
            "color": "red"
        }],
        "wished_cars": [{
            "make": "honda"
        }, {
            "make": "toyota",
            "style": "sleek"
        }, {
            "style": "fat"
        }]
    }]
}

I know I need to keep track of the root key somehow but am unsure how to fold this in. I would appreciate any help in solving this. I always struggle when the recursion gets this complex and would love to see how someone more experienced would approach it so I can improve.

While I am interested in the solution to this problem, I'm more interested in learning how to approach a problem like this? I understand whats happening at a high level in the recursive method but struggle when I need to start stepping through it. I don't know how to make the leap to adjusting the code to identify the root path.

6
  • 2
    "How do you approach difficult recursion problems as a developer?" Can you rename this please to something specific? The question title looks like the usual rubbish that's on the main page but it looks like you have a specific question. You risk getting downvoted from the title alone Commented Aug 20, 2022 at 20:03
  • 1
    It would also be helpful if you run your JSON through jsonlint.com (and pasting its output) before posting, or printing it with indentation Commented Aug 20, 2022 at 20:04
  • I just did. Thanks! Commented Aug 20, 2022 at 20:04
  • 1
    btw I think a better specification of keys_to_remove would be: (('owned_cars', 'make'), 'name') so that the strings don't have to be split to recover actual key names. Commented Aug 20, 2022 at 20:26
  • @cauthon for ("owned_cars", "make") would it only remove "make" if the immediate parent is "owned_cars"? Or is it possible that additional levels of data could exist between? Commented Aug 20, 2022 at 21:46

1 Answer 1

1

division of complexity

We could start with a remove function that takes any t and any number of paths -

def remove(t, *paths):
  for p in paths:
    t = remove1(t, p)
  return t

As you can see, it has a simple operation calling remove1(t, p) for all p in the provided paths. The final t is returned. This separates the complexity of removing a single path and removing many paths. We offload the majority of the work to remove1.

remove1

Your original code is pretty close. This remove1 takes any t and a single path.

  1. If the path is empty, return t unmodified
  2. (inductive) the path has at least one element. If t is a list, apply remove1(e, path) for all e of the list t
  3. (inductive) that path has at least one element and t is not a list. If t is a dictionary -
    • If the path has only one element, create a new dictionary with k assigned to the result of the sub-problem remove1(v, path) for all k,v of the dictionary t, excluding any k matching the path's element, path[0]
    • (inductive) the path has at least two elements. Create a new dictionary with k assigned to the result sub-problem remove1(v, path[1:]) if k matches the first element of that path otherwise assign k to the result of the sub-problem remove1(v, path) for all k,v of the dictionary t.
  4. (inductive) t is a non-list and t is a non-dictionary. Return t unmodified.
def remove1(t, path):
  if not path:
    return t
  elif isinstance(t, list):
    return list(remove1(e, path) for e in t)
  elif isinstance(t, dict):
    if len(path) == 1:
      return {k:remove1(v, path) for (k,v) in t.items() if not k == path[0] }
    else:
      return {k:remove1(v, path[1:]) if k == path[0] else remove1(v, path) for (k,v) in t.items()}
  else:
    return t

modification to the input data

I added another layer to your data so we can see precisely how remove is working -

data = {
  "results": [{
    "name": "john doe",
    "age": "100",
    "owned_cars": [{
      "additional_layer": {  # <-- additional layer
        "make": "foo",
        "color": "green"
      }
    }, {
      "make": "ford",
      "color": "white"
    }, {
      "make": "bmw",
      "color": "red"
    }],
    "wished_cars": [{
      "make": "honda"
    }, {
      "make": "toyota",
      "style": "sleek"
    }, {
      "style": "fat"
    }]
  }]
}

demo

Let's see remove work now -

import json
data = { ... }
new_data = remove(data, ("owned_cars", "make"), ("style",))
print(json.dumps(new_data, indent=2))

This says remove all "make" keys that are any descendant of "owned_cars" keys and remove all "style" keys -

{
  "results": [
    {
      "name": "john doe",
      "age": "100",
      "owned_cars": [
        {
          "additional_layer": {
                                  # <-- make removed
            "color": "green"
          }
        },
        {
                                  # <-- make removed
          "color": "white"
        },
        {
                                  # <-- make removed
          "color": "red"
        }
      ],
      "wished_cars": [
        {
          "make": "honda"      # <-- make not removed
        },
        {
          "make": "toyota"     # <-- make not removed
                               # <-- style removed
        },
        {}                     # <-- style removed
      ]
    }
  ]
}
Sign up to request clarification or add additional context in comments.

3 Comments

@cauthon, if you want it to target absolute paths, that would be a different function. This code gets you very close to that answer. Can you think of how you would modify it to get that behavior?
Thank you! This works great - I will admit I'm a bit jealous at how quickly you were able to break down the problem and solve it. I guess it just comes with time and practice. Do you have any blog or book recommendations to improve this skill?
You're very welcome. It definitely takes some practice but you'll catch on sooner than you think. I don't have books on the topic to recommend but I've written many answers on python, recursion and inductive reasoning. If you have any follow up questions, don't be shy ^^

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.