1

I am a newbie to json and I tried what has been proposed here. But I failed.

My original file (abbreviated) is called test.csv and looks like this:

person_uuid sample_uuid sample_slot sample_info
aa  AB  A   anything
aa  BD  B   more info
bc  FD  A   just info 
bc  AD  B   even more info 
bc  OI  C   text
hu  KL  B   texttext
hu  HF  C   information

The script I try to convert it with is called csv2json.py:

import csv
import json
import sys

base_name = sys.argv[1]
csvFilePath = "data/"+base_name+".csv"
jsonFilePath = "data/"+base_name+".json"

# https://stackoverflow.com/a/53474378/8584652
primary_fields = ['person_uuid']
secondary_fields = ['sample_slot']
result = []
with open(csvFilePath) as csv_file:
    reader = csv.DictReader(csv_file, delimiter='\t', skipinitialspace=True)
    for row in reader:
        d = {k: v for k, v in row.items() if k in primary_fields}
        e = {k: v for k, v in row.items() if k in secondary_fields}

        d['samples'] = [{k: v, }
                        for k, v in row.items() if k not in primary_fields]

        result.append(d)

# convert python jsonArray to JSON String and write to file
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
    jsonString = json.dumps(result, indent=4)
    jsonf.write(jsonString)

I envoke the conversion with python csv2json.py test and I get this as result:

[
    {
        "person_uuid": "aa",
        "samples": [
            {
                "sample_uuid": "AB"
            },
            {
                "sample_slot": "A"
            },
            {
                "sample_info": "anything"
            }
        ]
    },
    {
        "person_uuid": "aa",
        "samples": [
            {
                "sample_uuid": "BD"
            },
            {
                "sample_slot": "B"
            },
            {
                "sample_info": "more info"
            }
        ]
    },
    {
        "person_uuid": "bc",
        "samples": [
            {
                "sample_uuid": "FD"
            },
            {
                "sample_slot": "A"
            },
            {
                "sample_info": "just info "
            }
        ]
    },
    {
        "person_uuid": "bc",
        "samples": [
            {
                "sample_uuid": "AD"
            },
            {
                "sample_slot": "B"
            },
            {
                "sample_info": "even more info "
            }
        ]
    },
    {
        "person_uuid": "bc",
        "samples": [
            {
                "sample_uuid": "OI"
            },
            {
                "sample_slot": "C"
            },
            {
                "sample_info": "text"
            }
        ]
    },
    {
        "person_uuid": "hu",
        "samples": [
            {
                "sample_uuid": "KL"
            },
            {
                "sample_slot": "B"
            },
            {
                "sample_info": "texttext"
            }
        ]
    },
    {
        "person_uuid": "hu",
        "samples": [
            {
                "sample_uuid": "HF"
            },
            {
                "sample_slot": "C"
            },
            {
                "sample_info": "information"
            }
        ]
    }
]

But I would like to get instead:

[


    {
        "person_uuid": "aa",
        "samples": {
            "A": {
                "sample_uuid": "AB",
                "sample_info": "anything"
            },
            "B": {
                "sample_uuid": "BD",
                "sample_info": "more info"
            }
        }


    }, {
        "person_uuid": "bc",
        "samples": {
            "A": {
                "sample_uuid": "FD",
                "sample_info": "just info"
            },
            "B": {
                "sample_uuid": "AD",
                "sample_info": "even more info"
            },
            "C": {
                "sample_uuid": "OI",
                "sample_info": "text"
            }
        }
    },
    {
        "person_uuid": "hu",
        "samples": {
            "B": {
                "sample_uuid": "KL",
                "sample_info": "texttext"
            },
            "C": {
                "sample_uuid": "HF",
                "sample_info": "information"
            }
        }
    }

]

Any help appreciated how I can nest properly (what I tried with e = {k: v for k, v in row.items() if k in secondary_fields}).

1 Answer 1

1

Can be solved with iterools.groupby (also see this awswer).

Here an example:

from itertools import groupby


primary_fields = "person_uuid"
secondary_fields = "sample_slot"

with open(csvFilePath) as csv_file:
    reader = csv.DictReader(csv_file, delimiter='\t', skipinitialspace=True)
    result = []
    # We group them by all those who have the same primary_fields
    for key, group in groupby(reader, key=lambda x: x[primary_fields]):
        # We do the "sample" only for the filtered items
        samples = {
            elem[secondary_fields]: {
            "sample_uuid": elem["sample_uuid"],
            "sample_info": elem["sample_info"],
            }
            for elem in group
        }
        result.append({primary_fields: key, "samples": samples})

And result it's:

[{'person_uuid': 'aa',
  'samples': {'A': {'sample_uuid': 'AB', 'sample_info': 'anything'},
   'B': {'sample_uuid': 'BD', 'sample_info': 'more info'}}},
 {'person_uuid': 'bc',
  'samples': {'A': {'sample_uuid': 'FD', 'sample_info': 'just info '},
   'B': {'sample_uuid': 'AD', 'sample_info': 'even more info '},
   'C': {'sample_uuid': 'OI', 'sample_info': 'text'}}},
 {'person_uuid': 'hu',
  'samples': {'B': {'sample_uuid': 'KL', 'sample_info': 'texttext'},
   'C': {'sample_uuid': 'HF', 'sample_info': 'information'}}}]
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you a lot. A quick question. Where should I implement a sort command for the field sample_slot? Otherwise I can do it with an additional command python -m json.tool --sort-keys.
Check the doc of json.dumps, see sort_keys param. Something like: json.dumps(result, indent=4, sort_keys=True)
Also this answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.