2

I am trying to access data from a JSON file into a pandas dataframe and seem to be stuck on how to retrieve a data within a map of the JSON.

I want to retrieve the followers_count entity within the user object of this json into a dataframe.

JSON File (sample record) below:

{"created_at": "Tue Aug 01 16:23:56 +0000 2017", "id": 892420643555336193, "retweet_count": 12345, "favorite_count": 23456, "user": {"id": 4196983835, "followers_count": 3200889, "friends_count": 104}}

here is what I have in terms of code (doesnt work as I dont know how to fetch the followers_count within the user object :

        tweet_data_df = pd.read_json('tweet-json.txt', lines=True)
        #Doesnt work
        #tweet_data_df = tweet_data_df[['id', 'favorite_count', 'retweet_count', 'created_at', 'user''followers_count']]
        #works but not enough for me
        tweet_data_df = tweet_data_df[['id', 'favorite_count', 'retweet_count', 'created_at']]
        tweet_data_df.head(5)

Appreciate your help !

2
  • Try json normalize Commented Jun 28, 2020 at 18:54
  • If json dictionary has a depth = 2 you can usepd.DataFrame(json_dict).apply(pd.Series) ? Commented Jun 28, 2020 at 18:56

1 Answer 1

0

If json object (dictionary) has a depth = 2, (i.e. just 2 nested dictionaries) you can use .apply(pd.Series).

{"key": {"key2":{val1, val2}} # depth = 2
{"key": {"key2":{val1, "key3":{val2}} # depth > 2, depth = 3

pd.DataFrame(dic).apply(pd.Series).reset_index(drop = True)

Otherwise depth > 2 you can iterate through, its keys recursively

def shrink_depth(dic, output_dict: dict, pkey= None):
    if isinstance(dic, dict):
        for key in dic:
            if key not in output_dict.keys():
                output_dict[key] = []
            
            shrink_depth(dic[key], output_dict, key) # call
    
    elif isinstance(dic, (list, set)):
        for val in dic:
            output_dict[pkey].append(val)
    else:
        output_dict[pkey].append(dic)

# update: Add nested dictionaries to the (id) key
dic = {"created_at": "Tue Aug 01 16:23:56 +0000 2017", "id": 892420643555336193, "retweet_count": 12345, "favorite_count": 23456, 
               "user": {"id": {4196983835: 43424}, "followers_count": 3200889, "friends_count": 104}}

output = {}

shrink_depth(dic, output)

output

{'created_at': ['Tue Aug 01 16:23:56 +0000 2017'],
 'id': [892420643555336193],
 'retweet_count': [12345],
 'favorite_count': [23456],
 'user': [],
 4196983835: [43424],
 'followers_count': [3200889],
 'friends_count': [104]}
Sign up to request clarification or add additional context in comments.

13 Comments

Just to understand - are you suggesting that the JSON object be modified to accommodate this ?
Yeah convert, json to a dictionary, then shrink its depth, but in your case, you can directly use the first one
sorry, I am stuck converting a txt file with JSON data into a dictionary. Tried some code but that throws an error.
tried this: with open('tweet-json.txt', 'r') as json_file: json_dict = json.load(json_file)
JSONDecodeError: Extra data: line 2 column 1 (char 3974) - it is a valid json file from a third party site though
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.