2

I have json file which has duplicate keys.

Example

{
  "data":"abc",
  "data":"xyz"
}

I want to make this as { "data1":"abc", "data2":"xyz" }

I tried using object_pairs_hook with json_loads, but it is not working. Could anyone one help me with Python solution for above problem

4
  • When you will consume that json, there will be no duplicates. Because of stackoverflow.com/questions/21832701/… Commented Jul 11, 2017 at 7:14
  • That's quite a strange requirement. Wouldn't you prefer {"data": ["abc", "xyz"]}? Commented Jul 11, 2017 at 7:38
  • Also, how did you get such a file? Can you not fix the source? Commented Jul 11, 2017 at 7:40
  • @AlexHall the input is not under my control. It is coming from different source. Hence I faced this issue Commented Jul 11, 2017 at 9:08

2 Answers 2

2

You can pass the load method a keyword parameter to handle pairing, there you can check for duplicates like this:

raw_text_data = """{
  "data":"abc",
  "data":"xyz",
  "data":"xyz22"
}"""
def manage_duplicates(pairs):
    d = {}
    k_counter = Counter(defaultdict(int))
    for k, v in pairs:
        d[k+str(k_counter[k])] = v
        k_counter[k] += 1

    return d

print(json.loads(raw_text_data, object_pairs_hook=manage_duplicates))

I used Counter to count each key, if it already exists, I'm saving the key as k+str(k_counter[k) - so it will be added with a trailing number.

P.S

If you have control on the input, I would highly recommend to change your json structure to:

{"data": ["abc", "xyz"]}

The rfc 4627 for application/json media type recommends unique keys but it doesn't forbid them explicitly:

The names within an object SHOULD be unique.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much for the help. It worked for me. As I do not have control over input, I could not change json structure
0

A quick and dirty solution using re.

import re

s = '{ "data":"abc", "data":"xyz", "test":"one", "test":"two", "no":"numbering" }'

def find_dupes(s):
    keys = re.findall(r'"(\w+)":', s)
    return list(set(filter(lambda w: keys.count(w) > 1, keys)))

for key in find_dupes(s):
    for i in range(1, len(re.findall(r'"{}":'.format(key), s)) + 1):
        s = re.sub(r'"{}":'.format(key), r'"{}{}":'.format(key, i), s, count=1)

print(s)

Prints this string:

{
    "data1":"abc",
    "data2":"xyz",
    "test1":"one",
    "test2":"two",
    "no":"numbering"
}

6 Comments

This will be fail if the user has more than just data dup keys
@OrDuan that's true, but OP asked for a simple substitution. If a universal solution was needed, I'd take my time. Nonetheless, thanks for feedback!
@BrightOne thanks for the solution. As OrDuan mentioned it will fail if i provide more than just data duplicate keys. Could you please help address this issue
Edited. gosh, this is so ugly
@BrightOne thanks for the updated answer. It is still giving one issue. It is not working for nested objects. It is working only for outer objects.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.