Convert nested JSON from a CSV into a Pandas dataframe

Question

I am desperately trying to convert a nested JSON feature within a CSV into data frame rows. Could you help?

Sample CSV row

2021-09-26T08:25:43.021051958Z,"{""level"":""info"",""message"":""Success (Cached)"",""request"":""GET /api/v1/settingsid=3"",""httpCode"":200,""service"":""stats-vis-backend"",""timestamp"":""2021-09-26 08:25:43""}",ip-10-xxx-xxx-18.eu-central-1.compute.internal,podname-75ffdf6b-gns8v

Desired output (using JSON part only):

id	message	request	httpCode	service	timestamp
0	Success (Cached)	GET /api/v1/settings?id=3	200	stats-vis-backend	2021-09-26 08:25:43

If this would be the data frame output structure, I would be more than happy. I tried JSON normalize etc., but I am far away from a solution.

Thanks so much!!

Best David

Full Code trial (based on SeaBean):


import csv
import ast
import pandas as pd

# read CSV 
df = pd.read_csv('/Users/David/xaa.csv',sep=',', header=None)

print(df.head(1))

# convert string of JSON/dict to real JSON/dict 
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)

# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())

print(df_json.head(1))

Full output dump

File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-7-86e494aa8f0c>", line 12, in <module>
    df[1] = df[1].apply(ast.literal_eval)

  File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 4138, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)

  File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer

  File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')

  File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 47, in parse
    return compile(source, filename, mode, flags,

  File "<unknown>", line 1
    > next start
    ^
SyntaxError: invalid syntax

Sample Output df1

0         {"level":"info","message":"Success (Cached)","...
1         {"level":"info","message":"Success (Cached)","...
2         {"level":"info","message":"Success (Cached)","...
3         {"level":"info","message":"Success","request":...
4         {"level":"info","message":"Success (Cached)","...
                                ...                        
249995    {"level":"info","message":"Success (Cached)","...
249996    {"level":"info","message":"Success (Cached)","...
249997    {"level":"info","message":"Success (Cached)","...
249998    {"level":"info","message":"Success","request":...
249999    {"level":"info","message":"Success (Cached)","...
Name: 1, Length: 250000, dtype: object

Sample toDict() Output of df1

{0: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:25:43"}',
 1: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:26:17"}',

Output print(df.iloc[[4480]])

                               0             1  \
4480  2021-09-26T12:00:58.983344643Z  > next start   

                                                   2  \
4480  ip-10-xxx-xxxx-30.eu-central-1.compute.internal   

                                  3  
4480  xxxx-converter-75ffxf6b-jq2w7

What did you try? share the code please.

balderman
– balderman

2021-10-12 14:28:02 +00:00
Commented Oct 12, 2021 at 14:28 — balderman
– balderman, Commented Oct 12, 2021 at 14:28

SeaBean · Accepted Answer · 2021-10-12 15:18:49Z

2

You can use pd.DataFrame on the list of column values of the second column (with JSON) after converting the string of JSON to real JSON (not in string), as follows:

# read CSV 
df = pd.read_csv(r'mycsv.csv', sep=',', header=None)

# convert string of JSON/dict to real JSON/dict 
import ast
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)

# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())

If you have already read the CSV into dataframe with column header, you can also use the column label of the second column instead of 1 for the column label for second column in the codes above.

Result:

print(df_json)


  level           message                   request  httpCode            service            timestamp
0  info  Success (Cached)  GET /api/v1/settingsid=3       200  stats-vis-backend  2021-09-26 08:25:43

edited Oct 12, 2021 at 15:18

answered Oct 12, 2021 at 14:36

SeaBean

23.4k3 gold badges16 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

SeaBean Over a year ago

Hi @David, so you got SyntaxError error at the step with ast ? Can you post the whole trace dump at the end of your question ? I run the same code without problem. So, need to check your dump to check for more info

SeaBean Over a year ago

Hi @David The dump just further confirm the problem occurred during ast parsing the string of JSON. Can you try replacing .apply() with .map() ? i.e. df[1] = df[1].map(ast.literal_eval)

SeaBean Over a year ago

@David Also, can use see the JSON by printing the 2nd column by print(df[1]) ?

SeaBean Over a year ago

@David The problem seems lying on the step converting the string of dict to dict by ast.literal_eval Common reason is there exist some invalid text/format in the dict. Would you please dump the first few lines of df[1] for investigation ? You can dump by df[1].head(10).to_dict()

SeaBean Over a year ago

@David You can also have a quick check whether eval instead of ast.literal_eval can parse the dict. However, see whether the data in your dict is trusted data free from possible malicious codes inside. If not trusted data, don't use eval That is, check with df[1] = df[1].apply(eval)

|

Collectives™ on Stack Overflow

Convert nested JSON from a CSV into a Pandas dataframe

1 Answer 1

14 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

14 Comments

Your Answer

Sign up or log in

Post as a guest

Related