Create a column in dataframe using lambda based on another columns with non-null values

Question

I have dataframe with movie titles and columns with genres. Such as movie with title 'One' is 'Action' and 'Vestern', because have '1' in appropriate columns.

   Movie  Action  Fantasy  Vestern
0    One       1        0        1
1    Two       0        0        1
2  Three       1        1        0

My goal is create column genres, which will contain name of each genres, that particular movie have. For this I am tried used lambda and list comprehension, because thought this helps. But after runned such line of code as:

df['genres'] = df.apply(lambda x: [x+"|"+x for x in df.columns if x!=0])

I got only NaN value in each row:

   Movie  Action  Fantasy  Vestern genres
0    One       1        0        1    NaN
1    Two       0        0        1    NaN
2  Three       1        1        0    NaN

Also tried to use groupby, but didn't succeed.

Expected output is:

   Movie  Action  Fantasy  Vestern          genres
0    One       1        0        1  Action|Vestern
1    Two       0        0        1         Vestern
2  Three       1        1        0  Action|Fantasy

Code to reproduce:

import pandas as pd
import numpy as np

df = pd.DataFrame({"Movie":['One','Two','Three'],
                   "Action":[1,0,1],
                   "Fantasy":[0,0,1],
                   "Vestern":[1,1,0]})
print(df)

Thanks for your help

jezrael · Accepted Answer · 2019-01-08 13:23:55Z

For improve performance is possible use dot all columns without first with all columns without last with separator, last remove last | by rstrip:

df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '|').str.rstrip('|')
print (df)
   Movie  Action  Fantasy  Vestern             new
0    One       1        0        1  Action|Vestern
1    Two       0        0        1         Vestern
2  Three       1        1        0  Action|Fantasy

Or use list comprehensions for join all values without empty strings:

arr = df.iloc[:, 1:].values * df.columns[1:].values
df['new'] = ['|'.join(y for y in x if y) for x in arr]
print (df)
   Movie  Action  Fantasy  Vestern             new
0    One       1        0        1  Action|Vestern
1    Two       0        0        1         Vestern
2  Three       1        1        0  Action|Fantasy

Performance:

In [54]: %timeit (jez1(df.copy()))
25.2 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [55]: %timeit (jez2(df.copy()))
61.4 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [56]: %timeit (csm(df.copy()))
1.46 s ± 35.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



df = pd.DataFrame({"Movie":['One','Two','Three'],
                   "Action":[1,0,1],
                   "Fantasy":[0,0,1],
                   "Vestern":[1,1,0]})
#print(df)

#30k rows
df = pd.concat([df] * 10000, ignore_index=True)

def csm(df):
    cols = df.columns.tolist()[1:]
    df['genres'] = df.apply(lambda x: "|".join(str(z) for z in [i for i in cols if x[i] !=0]) ,axis=1)
    return df

def jez1(df):
    df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '|').str.rstrip('|')
    return df

def jez2(df):
    arr = df.iloc[:, 1:].values * df.columns[1:].values
    df['new'] = ['|'.join(y for y in x if y) for x in arr]
    return df

@jezreal: haha you always beat me both in time a quality of solution :)

Chandu · Accepted Answer · 2019-01-08 13:12:09Z

1

import pandas as pd
import numpy as np

df = pd.DataFrame({"Movie":['One','Two','Three'],
                   "Action":[1,0,1],
                   "Fantasy":[0,0,1],
                   "Vestern":[1,1,0]})

cols = df.columns.tolist()[1:]

df['genres'] = df.apply(lambda x: "|".join(str(z) for z in [i for i in cols if x[i] !=0]) ,axis=1)
print(df)

output

Movie  Action  Fantasy  Vestern          genres
0    One       1        0        1  Action|Vestern
1    Two       0        0        1         Vestern
2  Three       1        1        0  Action|Fantasy

answered Jan 8, 2019 at 13:12

Chandu

2,1393 gold badges28 silver badges40 bronze badges

Collectives™ on Stack Overflow

Create a column in dataframe using lambda based on another columns with non-null values

2 Answers 2

1 Comment

output

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

output

Comments

Your Answer

Sign up or log in

Post as a guest

Related