18

I have a Dataframe which I want to transform into a multidimensional array using one of the columns as the 3rd dimension.
As an example:

df = pd.DataFrame({
'id': [1, 2, 2, 3, 3, 3],
'date': np.random.randint(1, 6, 6),
'value1': [11, 12, 13, 14, 15, 16],
'value2': [21, 22, 23, 24, 25, 26]
 })

enter image description here

I would like to transform it into a 3D array with dimensions (id, date, values) like this:
enter image description here
The problem is that the 'id's do not have the same number of occurrences so I cannot use np.reshape().

For this simplified example, I was able to use:

ra = np.full((3, 3, 3), np.nan)

for i, value in enumerate(df['id'].unique()):
    rows = df.loc[df['id'] == value].shape[0]
    ra[i, :rows, :] = df.loc[df['id'] == value, 'date':'value2']

To produce the needed result:
enter image description here
but the original DataFrame contains millions of rows.

Is there a vectorized way to accomplice the same result?

1 Answer 1

12

Approach #1

Here's one vectorized approach after sorting id col with df.sort_values('id', inplace=True) as suggested by @Yannis in comments -

count_id = df.id.value_counts().sort_index().values
mask = count_id[:,None] > np.arange(count_id.max())
vals = df.loc[:, 'date':'value2'].values
out_shp = mask.shape + (vals.shape[1],)
out = np.full(out_shp, np.nan)
out[mask] = vals

Approach #2

Another with factorize that doesn't require any pre-sorting -

x = df.id.factorize()[0]   
y = df.groupby(x).cumcount().values
vals = df.loc[:, 'date':'value2'].values
out_shp = (x.max()+1, y.max()+1, vals.shape[1])
out = np.full(out_shp, np.nan)
out[x,y] = vals
Sign up to request clarification or add additional context in comments.

2 Comments

Perfect! It just needs df.sort_values('id', inplace=True) on top, to generalize when the DataFrame isn't already sorted by 'id'. Thank you @divakar very much!
@Yannis Thanks! Updated solution with that note.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.