0
import pandas as pd
import io
import numpy as np
import datetime

data = """
    date          id
    2015-10-31    50230
    2015-10-31    48646
    2015-10-31    48748
    2015-10-31    46992
    2015-11-01    46491
    2015-11-01    45347
    2015-11-01    45681
    2015-11-01    46430
    """

df = pd.read_csv(io.StringIO(data), delimiter='\s+', index_col=False, parse_dates = ['date'])

df2 = pd.DataFrame(index=df.index)

df2['Check'] = np.where(datetime.datetime.strftime(df['date'],'%B')=='October',0,1)

I have this example I'm working with. What df2['Check'] is doing is if df['date'] == 'October' then I assign 0, otherwise 1.

np.where works fine with other condition, but strftime isn't liking the series causing this error:

Traceback (most recent call last):
  File "C:/Users/Leb/Desktop/Python/test2.py", line 22, in <module>
    df2['Check'] = np.where(datetime.datetime.strftime(df['date'],'%B')=='October',0,1)
TypeError: descriptor 'strftime' requires a 'datetime.date' object but received a 'Series'

If I loop it takes a long time with my actual data which is about 1M. How can I do this efficiently?

df2['Check'] should look like this:

  Check
0     0
1     0
2     0
3     0
4     1
5     1
6     1
7     1
3
  • Use the .dt accessor. Use Pandas 0.17. See the docs. You are getting the error because datetime works with single argument, not arrays. Commented Nov 2, 2015 at 4:56
  • Very useful, I'll keep that in mind. Part of anaconda I have 0.16 for now. Commented Nov 2, 2015 at 5:03
  • Shouldn't df['date'].dt.month==9 just work even in 0.16.0? Commented Nov 2, 2015 at 8:59

2 Answers 2

3

This is a slightly simpler version, using the month attribute of the datetime object. If that is equal to 10, just map true / false values to your desired 0 / 1 pairs:

df2['Check']=df.date.apply(lambda x: x.month==10).map({True:0,False:1})
Sign up to request clarification or add additional context in comments.

Comments

0

@ako's answer is on the money, but based on @Kartik's and @EdChum's comments here's what I came up with:

import pandas as pd
import io
import numpy as np

data = """
    2015-10-31    50230
    2015-10-31    48646
    2015-10-31    48748
    2015-10-31    46992
    2015-11-01    46491
    2015-11-01    45347
    2015-11-01    45681
    2015-11-01    46430
    """

df = pd.read_csv(io.StringIO(data*125000), delimiter='\s+', index_col=False, names=['date','id'], parse_dates = ['date'])

df2 = pd.DataFrame(index=df.index)

df.shape
(1125000, 2)

%timeit df2['Check']=df.date.apply(lambda x: x.month==10).map({True:0,False:1})
1 loops, best of 3: 2.56 s per loop

%timeit df2['date'] = np.where(df['date'].dt.month==10,0,1)
10 loops, best of 3: 80.5 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.