0

I am desperately trying to convert a nested JSON feature within a CSV into data frame rows. Could you help?

Sample CSV row

2021-09-26T08:25:43.021051958Z,"{""level"":""info"",""message"":""Success (Cached)"",""request"":""GET /api/v1/settingsid=3"",""httpCode"":200,""service"":""stats-vis-backend"",""timestamp"":""2021-09-26 08:25:43""}",ip-10-xxx-xxx-18.eu-central-1.compute.internal,podname-75ffdf6b-gns8v

Desired output (using JSON part only):

id message request httpCode service timestamp
0 Success (Cached) GET /api/v1/settings?id=3 200 stats-vis-backend 2021-09-26 08:25:43

If this would be the data frame output structure, I would be more than happy. I tried JSON normalize etc., but I am far away from a solution.

Thanks so much!!

Best David

Full Code trial (based on SeaBean):


import csv
import ast
import pandas as pd

# read CSV 
df = pd.read_csv('/Users/David/xaa.csv',sep=',', header=None)

print(df.head(1))

# convert string of JSON/dict to real JSON/dict 
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)

# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())

print(df_json.head(1))

Full output dump

File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-7-86e494aa8f0c>", line 12, in <module>
    df[1] = df[1].apply(ast.literal_eval)

  File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 4138, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)

  File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer

  File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')

  File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 47, in parse
    return compile(source, filename, mode, flags,

  File "<unknown>", line 1
    > next start
    ^
SyntaxError: invalid syntax

Sample Output df1

0         {"level":"info","message":"Success (Cached)","...
1         {"level":"info","message":"Success (Cached)","...
2         {"level":"info","message":"Success (Cached)","...
3         {"level":"info","message":"Success","request":...
4         {"level":"info","message":"Success (Cached)","...
                                ...                        
249995    {"level":"info","message":"Success (Cached)","...
249996    {"level":"info","message":"Success (Cached)","...
249997    {"level":"info","message":"Success (Cached)","...
249998    {"level":"info","message":"Success","request":...
249999    {"level":"info","message":"Success (Cached)","...
Name: 1, Length: 250000, dtype: object

Sample toDict() Output of df1

{0: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:25:43"}',
 1: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:26:17"}',

Output print(df.iloc[[4480]])

                               0             1  \
4480  2021-09-26T12:00:58.983344643Z  > next start   

                                                   2  \
4480  ip-10-xxx-xxxx-30.eu-central-1.compute.internal   

                                  3  
4480  xxxx-converter-75ffxf6b-jq2w7  
1
  • What did you try? share the code please. Commented Oct 12, 2021 at 14:28

1 Answer 1

2

You can use pd.DataFrame on the list of column values of the second column (with JSON) after converting the string of JSON to real JSON (not in string), as follows:

# read CSV 
df = pd.read_csv(r'mycsv.csv', sep=',', header=None)

# convert string of JSON/dict to real JSON/dict 
import ast
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)

# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())

If you have already read the CSV into dataframe with column header, you can also use the column label of the second column instead of 1 for the column label for second column in the codes above.

Result:

print(df_json)


  level           message                   request  httpCode            service            timestamp
0  info  Success (Cached)  GET /api/v1/settingsid=3       200  stats-vis-backend  2021-09-26 08:25:43
Sign up to request clarification or add additional context in comments.

14 Comments

Hi @David, so you got SyntaxError error at the step with ast ? Can you post the whole trace dump at the end of your question ? I run the same code without problem. So, need to check your dump to check for more info
Hi @David The dump just further confirm the problem occurred during ast parsing the string of JSON. Can you try replacing .apply() with .map() ? i.e. df[1] = df[1].map(ast.literal_eval)
@David Also, can use see the JSON by printing the 2nd column by print(df[1]) ?
@David The problem seems lying on the step converting the string of dict to dict by ast.literal_eval Common reason is there exist some invalid text/format in the dict. Would you please dump the first few lines of df[1] for investigation ? You can dump by df[1].head(10).to_dict()
@David You can also have a quick check whether eval instead of ast.literal_eval can parse the dict. However, see whether the data in your dict is trusted data free from possible malicious codes inside. If not trusted data, don't use eval That is, check with df[1] = df[1].apply(eval)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.