4

I have a data frame that looks like this:

       reviewerID        asin    reviewerName helpful  unixReviewTime  \
0  A1N4O8VOJZTDVB  B004A9SDD8  Annette Yancey  [1, 1]      1383350400   

I'd like to split the 'helpful' column into two different columns with names 'helpful_numerator' and 'helpful denominator and I can't figure it out.

Any help would be much appreciated!

4 Answers 4

11

You can use zip to unzip helpful into separate columns:

df['helpful_numerator'], df['helpful_denominator'] = zip(*df['helpful'])

Edit

As mentioned by @MaxU in the comments, if you want to drop the helpful column from your DataFrame, use pop when selecting the column in zip:

df['helpful_numerator'], df['helpful_denominator'] = zip(*df.pop('helpful'))

Timings

Using the following setup to create a larger sample DataFrame and functions to time against:

df = pd.DataFrame({'A': list('abc'), 'B': [[0,1],[2,3],[4,5]]})
df = pd.concat([df]*10**5, ignore_index=True)

def root(df):
    df['C'], df['D'] = zip(*df['B'])
    return df

def maxu(df):
    return df.join(pd.DataFrame(df.pop('B').tolist(), columns=['C', 'D']))

def flyingmeatball(df):
    df['C'] = df['B'].apply(lambda x: x[0])
    df['D'] = df['B'].apply(lambda x: x[1])
    return df

def psidom(df):
    df['C'] = df.B.str[0]
    df['D'] = df.B.str[1]
    return df

I get the following timings:

%timeit root(df.copy())
10 loops, best of 3: 70.6 ms per loop

%timeit maxu(df.copy())
10 loops, best of 3: 151 ms per loop

%timeit flyingmeatball(df.copy())
1 loop, best of 3: 223 ms per loop

%timeit psidom(df.copy())
1 loop, best of 3: 283 ms per loop
Sign up to request clarification or add additional context in comments.

8 Comments

Very nice solution!
Nice I like this!
What does the '*' do in this solution?
i'd change: zip(*df['helpful']) --> zip(*df.pop('helpful')) if OP doesn't need original helpful column after splitting
@Boud: I just did some quick timings, and zip appears to be faster. I'll add the timings shortly.
|
3

If helpful is a column of lists, you can use str to access the element in the list:

df['helpful_numerator'] = df.helpful.str[0]    
df['helpful_denominator'] = df.helpful.str[1]
df

enter image description here

Comments

2

yet another solution:

In [74]: df
Out[74]:
       reviewerID        asin    reviewerName  unixReviewTime helpful
0  A1N4O8VOJZTDVB  B004A9SDD8  Annette Yancey      1383350400  [1, 1]

In [75]: df.join(pd.DataFrame(df.pop('helpful').tolist(),
                              columns=['helpful_numerator','helpful_denominator']))
Out[75]:
       reviewerID        asin    reviewerName  unixReviewTime  helpful_numerator  helpful_denominator
0  A1N4O8VOJZTDVB  B004A9SDD8  Annette Yancey      1383350400                  1                    1

Comments

0

Assuming that the column contains a list, you can use .apply

df['helpful_numerator'] = df['helpful'].apply(lambda x: x[0])
df['helpful_denominator'] = df['helpful'].apply(lambda x: x[1])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.