1

For a data challenge at school we need to open a lot of json files with python. There are too many to open manually. Is there a way to open them with a for loop?

This is the way I open one of the json files and make it a dataframe (it works).

file_2016091718 = '/Users/thijseekelaar/Downloads/airlines_complete/airlines-1474121577751.json'

json_2016091718 = pd.read_json(file_2016091718, lines=True)

Here is a screenshot of how the map where the data is in looks (click here)

1
  • Yes, just list all json files in the directory, and iterate through the json files to open them @thijstue, check my answer below! Commented Apr 29, 2019 at 13:24

2 Answers 2

3

Yes, you can use os.listdir to list all the json files in your directory, create the full path for all of them and use the full path using os.path.join to open the json file

import os
import pandas as pd
base_dir = '/Users/thijseekelaar/Downloads/airlines_complete'

#Get all files in the directory

data_list = []
for file in os.listdir(base_dir):

    #If file is a json, construct it's full path and open it, append all json data to list
    if 'json' in file:
        json_path = os.path.join(base_dir, file)
        json_data = pd.read_json(json_path, lines=True)
        data_list.append(json_data)

print(data_list)
Sign up to request clarification or add additional context in comments.

9 Comments

Dear Devesh, thanks for your answer, I already came very close to what I wanted to achieve, but it only returned a dataframe with the last json file. Is there a way to have one big dataframe with all json files? I already tried this but it returns only a line:
base_dir = '/Users/thijseekelaar/Downloads/airlines_complete' json_data_firstmonth = pd.DataFrame() #Get all files in the directory for file in os.listdir(base_dir): #If file is a json, construct it's full path and open it if 'json' in file: json_path = os.path.join(base_dir, file) json_data = pd.read_json(json_path, lines=True) json_data_firstmonth = pd.concat([json_data_firstmonth, json_data])
Can you please update what you observed in the question itself!
When I did it exactly your way, it only returned a dataframe with the data of the last json file in it
Check the updated code, data_list should contain a list with all dataframes! @ThijsTUE
|
0

Try this :

import os

# not sure about the order
for root, subdirs, files in os.walk('your/json/dir/'):
    for file in files:
        with open(file, 'r'):
            #your stuff here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.