1

I have dataframe with movie titles and columns with genres. Such as movie with title 'One' is 'Action' and 'Vestern', because have '1' in appropriate columns.

   Movie  Action  Fantasy  Vestern
0    One       1        0        1
1    Two       0        0        1
2  Three       1        1        0

My goal is create column genres, which will contain name of each genres, that particular movie have. For this I am tried used lambda and list comprehension, because thought this helps. But after runned such line of code as:

df['genres'] = df.apply(lambda x: [x+"|"+x for x in df.columns if x!=0])

I got only NaN value in each row:

   Movie  Action  Fantasy  Vestern genres
0    One       1        0        1    NaN
1    Two       0        0        1    NaN
2  Three       1        1        0    NaN

Also tried to use groupby, but didn't succeed.

Expected output is:

   Movie  Action  Fantasy  Vestern          genres
0    One       1        0        1  Action|Vestern
1    Two       0        0        1         Vestern
2  Three       1        1        0  Action|Fantasy

Code to reproduce:

import pandas as pd
import numpy as np

df = pd.DataFrame({"Movie":['One','Two','Three'],
                   "Action":[1,0,1],
                   "Fantasy":[0,0,1],
                   "Vestern":[1,1,0]})
print(df)

Thanks for your help

2 Answers 2

1

For improve performance is possible use dot all columns without first with all columns without last with separator, last remove last | by rstrip:

df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '|').str.rstrip('|')
print (df)
   Movie  Action  Fantasy  Vestern             new
0    One       1        0        1  Action|Vestern
1    Two       0        0        1         Vestern
2  Three       1        1        0  Action|Fantasy

Or use list comprehensions for join all values without empty strings:

arr = df.iloc[:, 1:].values * df.columns[1:].values
df['new'] = ['|'.join(y for y in x if y) for x in arr]
print (df)
   Movie  Action  Fantasy  Vestern             new
0    One       1        0        1  Action|Vestern
1    Two       0        0        1         Vestern
2  Three       1        1        0  Action|Fantasy

Performance:

In [54]: %timeit (jez1(df.copy()))
25.2 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [55]: %timeit (jez2(df.copy()))
61.4 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [56]: %timeit (csm(df.copy()))
1.46 s ± 35.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



df = pd.DataFrame({"Movie":['One','Two','Three'],
                   "Action":[1,0,1],
                   "Fantasy":[0,0,1],
                   "Vestern":[1,1,0]})
#print(df)

#30k rows
df = pd.concat([df] * 10000, ignore_index=True)

def csm(df):
    cols = df.columns.tolist()[1:]
    df['genres'] = df.apply(lambda x: "|".join(str(z) for z in [i for i in cols if x[i] !=0]) ,axis=1)
    return df

def jez1(df):
    df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '|').str.rstrip('|')
    return df

def jez2(df):
    arr = df.iloc[:, 1:].values * df.columns[1:].values
    df['new'] = ['|'.join(y for y in x if y) for x in arr]
    return df
Sign up to request clarification or add additional context in comments.

1 Comment

@jezreal: haha you always beat me both in time a quality of solution :)
1
import pandas as pd
import numpy as np

df = pd.DataFrame({"Movie":['One','Two','Three'],
                   "Action":[1,0,1],
                   "Fantasy":[0,0,1],
                   "Vestern":[1,1,0]})

cols = df.columns.tolist()[1:]

df['genres'] = df.apply(lambda x: "|".join(str(z) for z in [i for i in cols if x[i] !=0]) ,axis=1)
print(df)

output

Movie  Action  Fantasy  Vestern          genres
0    One       1        0        1  Action|Vestern
1    Two       0        0        1         Vestern
2  Three       1        1        0  Action|Fantasy

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.