2

I have a dataframe that looks like

userId  feature1  feature2  feature3  ...
123456  0         0.45      0         ...
234567  0         0         0         ...
345678  0.6       0         0.2       ...
.
.

The features are mostly zeros but occasionally some of those would have non-zero values. A single row for a userId may have zero, one or more non-zero features.

I want to transform this into the following dataset:

userId  feature  value
123456  feature2 0.45
345678  feature1 0.6
345678  feature3 0.2

Essentially, we retain only the features that are non-zero for each userId. So, for userId 345678, we have 2 rows in the transformed dataset, one for feature1 and the other for feature3. userId 234567 is dropped since none of the features are non-zero.

Is this something that can be done using groupby or pivoting? If so, how?

Any other pandas-mic solutions?

1 Answer 1

5

Magic from melt

df.melt('userId').query('value!=0')
Out[459]: 
   userId  variable  value
2  345678  feature1   0.60
3  123456  feature2   0.45
8  345678  feature3   0.20

Notice using stack you need mask 0 to NaN

df.mask(df.eq(0)).set_index('userId').stack().reset_index()
Out[460]: 
   userId   level_1     0
0  123456  feature2  0.45
1  345678  feature1  0.60
2  345678  feature3  0.20
Sign up to request clarification or add additional context in comments.

2 Comments

This was magic, indeed. Is there a generic name to this operation, like pivoting is a standard operation on tabular data?
@Nik This is reshape , like melt and stack--Unpivots

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.