How to convert csv into nested json in python pandas?

Question

I have a csv like this:

    Art        Category  LEVEL 2    LEVEL 3 LEVEL 4 LEVEL 5 Location
0   PRINTMAKING VISUAL  CONTEMPORARY    2D  NaN NaN NaN
1   PAINTING    VISUAL  CONTEMPORARY    2D  NaN NaN NaN
2   AERIAL  VISUAL  CONTEMPORARY    2D  PHOTOGRAPHY AERIAL  NaN
3   WILDLIFE    VISUAL  CONTEMPORARY    2D  PHOTOGRAPHY WILDLIFE    NaN
4   NATURE  VISUAL  CONTEMPORARY    2D  PHOTOGRAPHY NATURE  NaN

The art and category will be there but the levels from l1 to l6 can be null. What I want to achive is like so:

art: PRINTMAKING
category: VISUAL
tags: [CONTEMPORARY, 2D]

The levels are basically tags for a particular art which are to stored in an array.

I am new to python and so far I have written the following code. How can I achive this.

import pandas as pd
import json
data = pd.read_excel("C:\\Users\\Desktop\\visual.xlsx")
rec = {}
rec['art'] = data['Art']
rec['category'] = data['Category']
rec['tags'] = data['LEVEL 2'] + ',' + data['LEVEL 3'] + ',' + data['LEVEL 4'] + ',' + data['LEVEL 5']

I guess this is not the correct way to do it.

U can understand it this way. Every art is in a category and has tags. The tags are in columns which need to be stored as an array. — Karan Gupta
– Karan Gupta, Commented Feb 28, 2019 at 6:09
It would be easier to understand if you displayed how your dataframe looks at the moment and what your expected output be like — Mohit Motwani
– Mohit Motwani, Commented Feb 28, 2019 at 6:11
I don't know how to proceed with the current code. I have edited the question for better understanding. — Karan Gupta
– Karan Gupta, Commented Feb 28, 2019 at 6:21

jezrael · Accepted Answer · 2019-02-28 06:40:52Z

2

for convert values of tags to lists without NaNs use:

df['tags'] = df.filter(like='LEVEL').apply(lambda x: x.dropna().tolist(), axis=1)
#alternative, should be faster
#df['tags'] = [[y for y in x if isinstance(y, str)] for x in
#                 df.filter(like='LEVEL').values]

d = df[['Art','Category','tags']].to_dict(orient='records')

[{
    'Art': 'PRINTMAKING',
    'Category': 'VISUAL',
    'tags': ['CONTEMPORARY', '2D']
}, {
    'Art': 'PAINTING',
    'Category': 'VISUAL',
    'tags': ['CONTEMPORARY', '2D']
}, {
    'Art': 'AERIAL',
    'Category': 'VISUAL',
    'tags': ['CONTEMPORARY', '2D', 'PHOTOGRAPHY', 'AERIAL']
}, {
    'Art': 'WILDLIFE',
    'Category': 'VISUAL',
    'tags': ['CONTEMPORARY', '2D', 'PHOTOGRAPHY', 'WILDLIFE']
}, {
    'Art': 'NATURE',
    'Category': 'VISUAL',
    'tags': ['CONTEMPORARY', '2D', 'PHOTOGRAPHY', 'NATURE']
}]

edited Feb 28, 2019 at 6:40

answered Feb 28, 2019 at 6:29

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Karan Gupta Over a year ago

is there a way to convert all into lower case?

jezrael Over a year ago

@KaranGupta - sure, change .apply(lambda x: x.dropna().tolist(), axis=1) to .apply(lambda x: x.str.lower().dropna().tolist(), axis=1)

jezrael Over a year ago

@KaranGupta - But working if all values are strings or NaN (None)s

Karan Gupta Over a year ago

all values are not null...some can be null. I didn't get you

jezrael Over a year ago

@KaranGupta - OK, I think it should not working if some numeric values

|

iamklaus · Accepted Answer · 2019-02-28 06:36:17Z

1

df

   Art     Category   LEVEL             2 LEVEL.1            3   LEVEL.2   4  \
0    0  PRINTMAKING  VISUAL  CONTEMPORARY      2D          NaN       NaN NaN   
1    1     PAINTING  VISUAL  CONTEMPORARY      2D          NaN       NaN NaN   
2    2       AERIAL  VISUAL  CONTEMPORARY      2D  PHOTOGRAPHY    AERIAL NaN   
3    3     WILDLIFE  VISUAL  CONTEMPORARY      2D  PHOTOGRAPHY  WILDLIFE NaN   
4    4       NATURE  VISUAL  CONTEMPORARY      2D  PHOTOGRAPHY    NATURE NaN   

   LEVEL.3   5  Location  
0      NaN NaN       NaN  
1      NaN NaN       NaN  
2      NaN NaN       NaN  
3      NaN NaN       NaN  
4      NaN NaN       NaN  

df = df.set_index(['Art','Category']).apply(lambda x: [','.join([str(a) for a in x.values if str(a) != 'nan'])], axis=1)

print(df.reset_index(name='tags'))

   Art     Category                                           tags
0    0  PRINTMAKING                       [VISUAL,CONTEMPORARY,2D]
1    1     PAINTING                       [VISUAL,CONTEMPORARY,2D]
2    2       AERIAL    [VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,AERIAL]
3    3     WILDLIFE  [VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,WILDLIFE]
4    4       NATURE    [VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,NATURE]

To dict

df.to_dict(orient='records')

Output

[{'Art': 0, 'Category': 'PRINTMAKING', 'tags': ['VISUAL,CONTEMPORARY,2D']},
 {'Art': 1, 'Category': 'PAINTING', 'tags': ['VISUAL,CONTEMPORARY,2D']},
 {'Art': 2,
  'Category': 'AERIAL',
  'tags': ['VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,AERIAL']},
 {'Art': 3,
  'Category': 'WILDLIFE',
  'tags': ['VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,WILDLIFE']},
 {'Art': 4,
  'Category': 'NATURE',
  'tags': ['VISUAL,CONTEMPORARY,2D,PHOTOGRAPHY,NATURE']}]

edited Feb 28, 2019 at 6:36

answered Feb 28, 2019 at 6:20

iamklaus

3,7682 gold badges14 silver badges21 bronze badges

2 Comments

Karan Gupta Over a year ago

I need a dictionary as described in the question

Karan Gupta Over a year ago

AttributeError: 'Series' object has no attribute 'set_index'

MUNGAI NJOROGE · Accepted Answer · 2019-02-28 06:29:21Z

This might solve your problem:

from io import StringIO
import csv
# help(csv)
categories="""art,category, l1, l2, l3, l4, l5, l6
a1,c1,abc,def
a2,c2,,,,xyz,pqr,
a3,c3,lmn,,,qwe,rtg,
"""

f=StringIO(categories)
rows=csv.DictReader(f,delimiter=',')
data=[]
for row in rows:
#     print(row)
    d={
        "cateory":row.get("category",''),
        "art":row.get("art",'')
    }
    try:
        del row["category"]
        del row["art"]
    except KeyError as ke:
        print(ke)
#     print(row)
    d["levels"]=list(row.values())
    print(d)

Sample output:

{'cateory': 'c1', 'art': 'a1', 'levels': ['abc', 'def', None, None, None, None]}
{'cateory': 'c2', 'art': 'a2', 'levels': ['', '', '', 'xyz', 'pqr', '']}
{'cateory': 'c3', 'art': 'a3', 'levels': ['lmn', '', '', 'qwe', 'rtg', '']}

Ohad Chaet · Accepted Answer · 2019-02-28 06:30:57Z

0

You should use pd.Series.str.cat combined with functools.reduce to concatenate all of the tags:

df = pd.DataFrame({
    'art': ['a1', 'a2', 'a3'],
    'category': ['c1', 'c2', 'c3'],
    'l1': ['abc', '', 'lmn'],
    'l2': ['def', 'xyz', 'qwe'],
})

from functools import reduce
tag_cols = [x for x in df.columns if x not in ['art', 'category']]
df['tags'] = reduce(lambda a, b: df[a].str.cat(df[b], sep=','), 
tag_cols).apply(lambda x: [t for t in x.split(",") if t != ''])
d = df.to_dict(orient='records')

Output

  [{'art': 'a1',
  'category': 'c1',
  'l1': 'abc',
  'l2': 'def',
  'tags': ['abc', 'def']},
 {'art': 'a2', 'category': 'c2', 'l1': '', 'l2': 'xyz', 'tags': ['xyz']},
 {'art': 'a3',
  'category': 'c3',
  'l1': 'lmn',
  'l2': 'qwe',
  'tags': ['lmn', 'qwe']}]

edited Feb 28, 2019 at 6:30

answered Feb 28, 2019 at 6:17

Ohad Chaet

5192 silver badges12 bronze badges

2 Comments

Karan Gupta Over a year ago

I need a dictionary bro

Ohad Chaet Over a year ago

Yeah I missed that. Edited the post for you with a dict.

Collectives™ on Stack Overflow

How to convert csv into nested json in python pandas?

4 Answers 4

9 Comments

2 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

9 Comments

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related