1

I have a pandas dataframe called df that contains twitter tweets created by taking the twitter json and loading it into the dataframe. I am trying to extract the interesting information. The coordinates column is mostly None, but sometimes it contains GeoJSON in this format:

{'coordinates': [21.425775, 8.906141], 'type': 'Point'}

Here 21.425775 refers to the longitude and 8.906141 refers to the latitude. I would like to extract the latitude and longitude into separate columns. Unfortunately my pandas skills are more towards the beginner level, so I am not sure how to do find and substring; also there seems to be better ways as suggested in this question which I don't fully understand.

An example of the dataframe is:

  coordinates
0 None
1 {'coordinates': [21.425775, 8.906141], 'type': 'Point'}

How can I extract the information in the nested JSON column into separate pandas columns while gracefully handling the None values in the other rows?

4
  • Can you post a sample data? Commented Jul 20, 2018 at 7:03
  • {'coordinates': [21.425775, 8.906141], 'type': 'Point'} is a sample, another sample would be None Commented Jul 20, 2018 at 7:29
  • Can you show a print of df? I am not able to understand how the column is... Commented Jul 20, 2018 at 7:31
  • Added example data to the question Commented Jul 20, 2018 at 7:34

1 Answer 1

1

If your 'coordinates' is a list then you can use tolist() with pd.DataFrame

Ex:

import pandas as pd
import numpy as np

df = pd.DataFrame({'coordinates': [{'coordinates': [21.425775, 8.906141], 'type': 'Point'}, None]})
df['temp'] = df['coordinates'].apply(lambda x: x.get("coordinates") if x else [np.nan, np.nan]).dropna()
df[['longitude','latitude']] = pd.DataFrame(df.temp.values.tolist(), index= df.index)
df.drop('temp', axis=1, inplace=True)
print(df)

Output:

                                         coordinates  longitude  latitude
0  {u'type': u'Point', u'coordinates': [21.425775...  21.425775  8.906141
1                                               None        NaN       NaN
Sign up to request clarification or add additional context in comments.

5 Comments

Gives me ValueError: Columns must be same length as key, presumably because of the 'type': 'Point' part.
Or possibly the None part
Updated snippet
Thanks! Works nice. Why does it need the dropna part?
You are welcome :) and you are correct...you do not need dropna()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.