import pandas as pd
import io
import numpy as np
import datetime
data = """
date id
2015-10-31 50230
2015-10-31 48646
2015-10-31 48748
2015-10-31 46992
2015-11-01 46491
2015-11-01 45347
2015-11-01 45681
2015-11-01 46430
"""
df = pd.read_csv(io.StringIO(data), delimiter='\s+', index_col=False, parse_dates = ['date'])
df2 = pd.DataFrame(index=df.index)
df2['Check'] = np.where(datetime.datetime.strftime(df['date'],'%B')=='October',0,1)
I have this example I'm working with. What df2['Check'] is doing is if df['date'] == 'October' then I assign 0, otherwise 1.
np.where works fine with other condition, but strftime isn't liking the series causing this error:
Traceback (most recent call last):
File "C:/Users/Leb/Desktop/Python/test2.py", line 22, in <module>
df2['Check'] = np.where(datetime.datetime.strftime(df['date'],'%B')=='October',0,1)
TypeError: descriptor 'strftime' requires a 'datetime.date' object but received a 'Series'
If I loop it takes a long time with my actual data which is about 1M. How can I do this efficiently?
df2['Check'] should look like this:
Check
0 0
1 0
2 0
3 0
4 1
5 1
6 1
7 1
.dtaccessor. Use Pandas 0.17. See the docs. You are getting the error because datetime works with single argument, not arrays.df['date'].dt.month==9just work even in0.16.0?