10

I have a pandas DataFrame with 2 indexes. (MultiIndex) I want to get out a Numpy Matrix with something like df.as_matrix(...) but this matrix has shape (n_rows, 1). I want a matrix of shape (n_index1_rows, n_index2_rows, 1).

Is there a way to use .groupby(...) then a .values.tolist() or .as_matrix(...) to get the desired shape?

EDIT: Data

                                                              value  
current_date                  temp_date                                        
1970-01-01 00:00:01.446237485 1970-01-01 00:00:01.446237489   30.497100   
                              1970-01-01 00:00:01.446237494    9.584300   
                              1970-01-01 00:00:01.446237455   10.134200   
                              1970-01-01 00:00:01.446237494    7.803683   
                              1970-01-01 00:00:01.446237400   10.678700   
                              1970-01-01 00:00:01.446237373    9.700000   
                              1970-01-01 00:00:01.446237180   15.000000   
                              1970-01-01 00:00:01.446236961   12.928866   
                              1970-01-01 00:00:01.446237032   10.458800

This is kind of the idea:

np.array([np.resize(x.as_matrix(["value"]).copy(), (500, 1)) for (i, x) in df.reset_index("current_date").groupby("current_date")])
7
  • You want a 3D array? Or just a 2D array including the index as well as the column? Commented Nov 3, 2015 at 20:15
  • 3D array. All values in np.array should be column values (not indexes) Commented Nov 3, 2015 at 20:17
  • Could you please provide some sample data with desired output? Commented Nov 3, 2015 at 21:01
  • Done. Ignore the funky datetimes Commented Nov 3, 2015 at 21:12
  • Sorry, why is this meant to become a 3D array? You have two indices (i=current_date and j=temp_date, presumably with some mapping -- right now temp_date isn't sorted, so it's not clear) and the value those indices specify. Isn't that a 2D object? Commented Nov 3, 2015 at 21:23

1 Answer 1

11

I think what you want is to unstack the multiindex, e.g.

df.unstack().values[:, :, np.newaxis]

Edit: if you have duplicate indices, unstacking won't work, and you probably want a pivot_table instead:

pivoted = df.reset_index().pivot_table(index='current_date',
                                       columns='temp_date',
                                       aggfunc='mean')
arr = pivoted.values[:, :, np.newaxis]
arr.shape
# (10, 50, 1)

Here's a full example of unstack. First we'll create some data:

current = pd.date_range('2015', periods=10, freq='D')
temp = pd.date_range('2015', periods=50, freq='D')
ind = pd.MultiIndex.from_product([current, temp],
                                 names=['current_date', 'temp_date'])
df = pd.DataFrame({'val':np.random.rand(len(ind))},
                  index=ind)
df.head()
#                               val
# current_date temp_date           
# 2015-01-01   2015-01-01  0.309488
#              2015-01-02  0.697876
#              2015-01-03  0.621318
#              2015-01-04  0.308298
#              2015-01-05  0.936828

Now we unstack the multiindex: we'll show the first 4x4 slice of the data:

df.unstack().iloc[:4, :4]
#                     val                                 
# temp_date    2015-01-01 2015-01-02 2015-01-03 2015-01-04
# current_date                                            
# 2015-01-01     0.309488   0.697876   0.621318   0.308298
# 2015-01-02     0.323530   0.751486   0.507087   0.995565
# 2015-01-03     0.805709   0.101129   0.358664   0.501209
# 2015-01-04     0.360644   0.941200   0.727570   0.884314

Now extract the numpy array, and reshape to [nrows x ncols x 1] as you specified in the question:

vals = df.unstack().values[:, :, np.newaxis]
print(vals.shape)
# (10, 50, 1)
Sign up to request clarification or add additional context in comments.

2 Comments

I get ValueError: Index contains duplicate entries, cannot reshape when trying to unstack. I have a ton of rows with some having the same temp_date (but different values). Id have to unstack millions of indexes. Is there a way to avoid this? Reindex the temp_date or something like this?
Oh – didn't know you had duplicates. In that case, you need to do some sort of aggregation to get the result you want (and you'll have to decide which aggregation is appropriate for your data) A pivot table would be a good approach: see my edit above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.