2

I have a pandas dataframe with a variable that is an array of arrays. I would like to create a new dataframe from this variable.

My current dataframe 'fruits' looks like this...

Id  Name    Color    price_trend
1   apple   red      [['1420848000','1.25'],['1440201600','1.35'],['1443830400','1.52']]
2   lemon   yellow   [['1403740800','0.32'],['1422057600','0.25']]

What I would like is a new dataframe from the 'price_trend' column that looks like this...

Id    date         price
1     1420848000   1.25
1     1440201600   1.35
1     1443830400   1.52
2     1403740800   0.32
2     1422057600   0.25

Thanks for the advice!

1 Answer 1

1

A groupby+apply should do the trick.

def f(group):
    row = group.irow(0)
    ids = [row['Id'] for v in row['price_trend']]
    dates = [v[0] for v in row['price_trend']]
    prices = [v[1] for v in row['price_trend']]
    return DataFrame({'Id':ids, 'date': dates, 'price': prices})

In[7]: df.groupby('Id', group_keys=False).apply(f)
Out[7]:
   Id        date price
0   1  1420848000  1.25
1   1  1440201600  1.35
2   1  1443830400  1.52
0   2  1403740800  0.32
1   2  1422057600  0.25

Edit:

To filter out bad data (for instance, a price_trend column having value [['None']]), one option is to use pandas boolean indexing.

 criterion = df['price_trend'].map(lambda x: len(x) > 0 and all(len(pair) == 2 for pair in x))
 df[criterion].groupby('Id', group_keys=False).apply(f)
Sign up to request clarification or add additional context in comments.

6 Comments

Please forgive me if I'm missing something obvious here (still learning), but when I run the code above I get the error 'NameError: global name 'DataFrame' is not defined'. Any advice?
Ok, so I changed 'return DataFrame' to 'return pd.DataFrame' and now I'm getting the error message "IndexError: list index out of range". Any advice for this situation?
@nflove The error may come from the indexing in f (v[0] or v[1]). Your example data's price_trend is a list of lists with 2 elements. If that's not true I'd check my price_trend data for bad data (lists with a single element) and filter those out.
thanks for the response, I think you're right. Some of my rows are empty and filled with the [['None']] placeholder. Any suggestions of how to edit your code above to handle this?
@nflove I added an edit to show one way to handle that
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.