2

I have the following data

{ "results": [
    {
        "company": "XYZ",
        "createdAt": "2014-03-27T23:21:48.758Z",
        "email": "[email protected]",
        "firstName": "abc",
        "lastName": "xyz",
        "linkedinAccount": "",
        "location": "",
        "profilePicture": {
            "__type": "File",
            "name": "ab0e-profilePicture",
            "url": "url.url.com"
        },
        "registrationGate": "normal",
        "telephone": "",
        "title": "AA",
        "updatedAt": "2014-03-27T23:24:20.220Z",
        "username": "[email protected]",
        "zipcode": "00000"
    } 
    ] 
    }

I import the json data using the following code

import json
import pandas as pd

from pandas import DataFrame
json_data = pd.read_json('data.json')

print json_data[:2]

This prints

results
0  {u'linkedinAccount': u'', u'username': u'abc...
1  {u'linkedinAccount': u'zxcflcnv', u'username...

[2 rows x 1 columns]

when I try to print a column using

print df['linkedinAccount']

I get the following error

KeyError: u'no item named linkedinAccount'

How do I access data in the dataframes based on column names?

2
  • Possible dupe: stackoverflow.com/questions/21494030/… Commented Apr 22, 2014 at 0:52
  • @U2EF1, it is similar but I think that solution would be less ideal for this situation if you might agree. Commented Apr 22, 2014 at 3:50

1 Answer 1

2

Not sure how your multiple observations are organized in json. But it is clear that what is causing problem is you are having a nested structure for the "profilePicture" field. Therefore each observation is expressed as a nested dictionary. You need to convert each observation to a dataframe and concat them into the final dataframe as in this solution.

In [3]:
print df
                                             results
0  {u'linkedinAccount': u'', u'username': u'abc@g...
1  {u'linkedinAccount': u'', u'username': u'abc@g...

[2 rows x 1 columns]
In [4]: 
print pd.concat([pd.DataFrame.from_dict(item, orient='index').T for item in df.results])


  linkedinAccount       username registrationGate firstName title lastName  \
0                  [email protected]           normal       abc    AA      xyz   
0                  [email protected]           normal       abc    AA      xyz   

  company telephone                                     profilePicture  \
0     XYZ            {u'url': u'url.url.com', u'__type': u'File', u...   
0     ABC            {u'url': u'url.url.com', u'__type': u'File', u...   

  location                 updatedAt          email                 createdAt  \
0           2014-03-27T23:24:20.220Z  [email protected]  2014-03-27T23:21:48.758Z   
0           2014-03-27T23:24:20.220Z  [email protected]  2014-03-27T23:21:48.758Z   

  zipcode  
0   00000  
0   00000  

[2 rows x 14 columns]

Then you may want to think about how to deal the the profilePicture column. You can do what @U2EF1 suggested in the link. But I would probably just break that column into three columns pfPIC_url, pfPIC_type, pfPIC_name

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.