Create a column based on another dataframe values

Question

import pandas as pd
import io
import numpy as np
import datetime

data = """
    date          id
    2015-10-31    50230
    2015-10-31    48646
    2015-10-31    48748
    2015-10-31    46992
    2015-11-01    46491
    2015-11-01    45347
    2015-11-01    45681
    2015-11-01    46430
    """

df = pd.read_csv(io.StringIO(data), delimiter='\s+', index_col=False, parse_dates = ['date'])

df2 = pd.DataFrame(index=df.index)

df2['Check'] = np.where(datetime.datetime.strftime(df['date'],'%B')=='October',0,1)

I have this example I'm working with. What df2['Check'] is doing is if df['date'] == 'October' then I assign 0, otherwise 1.

np.where works fine with other condition, but strftime isn't liking the series causing this error:

Traceback (most recent call last):
  File "C:/Users/Leb/Desktop/Python/test2.py", line 22, in <module>
    df2['Check'] = np.where(datetime.datetime.strftime(df['date'],'%B')=='October',0,1)
TypeError: descriptor 'strftime' requires a 'datetime.date' object but received a 'Series'

If I loop it takes a long time with my actual data which is about 1M. How can I do this efficiently?

df2['Check'] should look like this:

Use the .dt accessor. Use Pandas 0.17. See the docs. You are getting the error because datetime works with single argument, not arrays. — Kartik
– Kartik, Commented Nov 2, 2015 at 4:56
Very useful, I'll keep that in mind. Part of anaconda I have 0.16 for now. — Leb
– Leb, Commented Nov 2, 2015 at 5:03
Shouldn't df['date'].dt.month==9 just work even in 0.16.0? — EdChum
– EdChum, Commented Nov 2, 2015 at 8:59

ako · Accepted Answer · 2015-11-02 04:55:52Z

3

This is a slightly simpler version, using the month attribute of the datetime object. If that is equal to 10, just map true / false values to your desired 0 / 1 pairs:

df2['Check']=df.date.apply(lambda x: x.month==10).map({True:0,False:1})

answered Nov 2, 2015 at 4:55

ako

3,7094 gold badges32 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Leb · Accepted Answer · 2015-11-02 22:05:11Z

@ako's answer is on the money, but based on @Kartik's and @EdChum's comments here's what I came up with:

import pandas as pd
import io
import numpy as np

data = """
    2015-10-31    50230
    2015-10-31    48646
    2015-10-31    48748
    2015-10-31    46992
    2015-11-01    46491
    2015-11-01    45347
    2015-11-01    45681
    2015-11-01    46430
    """

df = pd.read_csv(io.StringIO(data*125000), delimiter='\s+', index_col=False, names=['date','id'], parse_dates = ['date'])

df2 = pd.DataFrame(index=df.index)

df.shape
(1125000, 2)

%timeit df2['Check']=df.date.apply(lambda x: x.month==10).map({True:0,False:1})
1 loops, best of 3: 2.56 s per loop

%timeit df2['date'] = np.where(df['date'].dt.month==10,0,1)
10 loops, best of 3: 80.5 ms per loop

Collectives™ on Stack Overflow

Create a column based on another dataframe values

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related