rename duplicate key in json file python

Question

I have json file which has duplicate keys.

Example

{
  "data":"abc",
  "data":"xyz"
}

I want to make this as { "data1":"abc", "data2":"xyz" }

I tried using object_pairs_hook with json_loads, but it is not working. Could anyone one help me with Python solution for above problem

When you will consume that json, there will be no duplicates. Because of stackoverflow.com/questions/21832701/… — vishes_shell
– vishes_shell, Commented Jul 11, 2017 at 7:14
That's quite a strange requirement. Wouldn't you prefer {"data": ["abc", "xyz"]}? — Alex Hall
– Alex Hall, Commented Jul 11, 2017 at 7:38
Also, how did you get such a file? Can you not fix the source? — Alex Hall
– Alex Hall, Commented Jul 11, 2017 at 7:40
@AlexHall the input is not under my control. It is coming from different source. Hence I faced this issue — Ameya Kulkarni
– Ameya Kulkarni, Commented Jul 11, 2017 at 9:08

Community · Accepted Answer · 2021-10-07 06:09:01Z

2

You can pass the load method a keyword parameter to handle pairing, there you can check for duplicates like this:

raw_text_data = """{
  "data":"abc",
  "data":"xyz",
  "data":"xyz22"
}"""
def manage_duplicates(pairs):
    d = {}
    k_counter = Counter(defaultdict(int))
    for k, v in pairs:
        d[k+str(k_counter[k])] = v
        k_counter[k] += 1

    return d

print(json.loads(raw_text_data, object_pairs_hook=manage_duplicates))

I used Counter to count each key, if it already exists, I'm saving the key as k+str(k_counter[k) - so it will be added with a trailing number.

P.S

If you have control on the input, I would highly recommend to change your json structure to:

{"data": ["abc", "xyz"]}

The rfc 4627 for application/json media type recommends unique keys but it doesn't forbid them explicitly:

The names within an object SHOULD be unique.

edited Oct 7, 2021 at 6:09

CommunityBot

11 silver badge

answered Jul 11, 2017 at 7:36

0xdead

14k6 gold badges62 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ameya Kulkarni Over a year ago

Thank you so much for the help. It worked for me. As I do not have control over input, I could not change json structure

Oleksii Filonenko · Accepted Answer · 2017-07-11 09:30:51Z

0

A quick and dirty solution using re.

import re

s = '{ "data":"abc", "data":"xyz", "test":"one", "test":"two", "no":"numbering" }'

def find_dupes(s):
    keys = re.findall(r'"(\w+)":', s)
    return list(set(filter(lambda w: keys.count(w) > 1, keys)))

for key in find_dupes(s):
    for i in range(1, len(re.findall(r'"{}":'.format(key), s)) + 1):
        s = re.sub(r'"{}":'.format(key), r'"{}{}":'.format(key, i), s, count=1)

print(s)

Prints this string:

{
    "data1":"abc",
    "data2":"xyz",
    "test1":"one",
    "test2":"two",
    "no":"numbering"
}

edited Jul 11, 2017 at 9:30

answered Jul 11, 2017 at 7:27

Oleksii Filonenko

1,6711 gold badge18 silver badges27 bronze badges

6 Comments

0xdead Over a year ago

This will be fail if the user has more than just data dup keys

Oleksii Filonenko Over a year ago

@OrDuan that's true, but OP asked for a simple substitution. If a universal solution was needed, I'd take my time. Nonetheless, thanks for feedback!

Ameya Kulkarni Over a year ago

@BrightOne thanks for the solution. As OrDuan mentioned it will fail if i provide more than just data duplicate keys. Could you please help address this issue

Oleksii Filonenko Over a year ago

Edited. gosh, this is so ugly

Ameya Kulkarni Over a year ago

@BrightOne thanks for the updated answer. It is still giving one issue. It is not working for nested objects. It is working only for outer objects.

|

Collectives™ on Stack Overflow

rename duplicate key in json file python

2 Answers 2

1 Comment

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related