1

I have this csv file "rfm_data.csv":

CustomerID PurchaseDate  TransactionAmount ProductInformation
8814       11-04-23             943.31          Product C
2188       11-04-23             463.70          Product A
4608       11-04-23              80.28          Product A
2559       11-04-23             221.29          Product A

I read and transform data with this code:

    data = pd.read_csv("rfm_data.csv")
    data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'],  format='%d-%m-%y')
    data['Recency'] = (datetime.now().date() - data['PurchaseDate'].dt.date).dt.days

When I print (data) I get this error message:

AttributeError: Can only use .dt accessor with datetimelike values. Did you mean: 'at'?

If I delete the dt.day from the last line of code I got this result:

CustomerID PurchaseDate  TransactionAmount ProductInformation Recency
8814       2023-04-11             943.31          Product C   140 days, 0:00:00
2188       2023-04-11             463.70          Product A   140 days, 0:00:00
4608       2023-04-11              80.28          Product A   140 days, 0:00:00
2559       2023-04-11             221.29          Product A   140 days, 0:00:00

But what I want in [Recency] is only the number of days to make further calculations.

3
  • 4
    shouldn't format='%y-%m-%d'? Commented Aug 29, 2023 at 16:17
  • Yes you are right, the csv information in the post was incorrect. It is corrected now.... Commented Aug 29, 2023 at 16:29
  • Thank you very much! It is working now. I was going crazy, even Bard and Chagpt generated the same code Commented Aug 29, 2023 at 16:42

1 Answer 1

0

Your problem lies in calling .dt.date, which returns a vanilla Python date object column - that has no dt accessor. Since your input only has dates, normalizing to the date is not needed. If you need to do it anyways (other use case maybe), use .dt.floor("d").

EX:

from io import StringIO
import pandas as pd

s = """CustomerID PurchaseDate TransactionAmount ProductInformation
8814 11-04-23 943.31 Product-C
2188 11-04-23 463.70 Product-A
4608 11-04-23 80.28 Product-A
2559 11-04-23 221.29 Product-A"""

data = pd.read_csv(StringIO(s), sep=" ")
data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'],  format='%d-%m-%y')
data['Recency'] = (pd.Timestamp("now").floor("d") - data['PurchaseDate']).dt.days

print(data)
   CustomerID PurchaseDate  TransactionAmount ProductInformation  Recency
0        8814   2023-04-11             943.31          Product-C      140
1        2188   2023-04-11             463.70          Product-A      140
2        4608   2023-04-11              80.28          Product-A      140
3        2559   2023-04-11             221.29          Product-A      140

Note that you can also use pd.Timestamp("now").floor("d") to get today's date, which makes the code a bit more clean since you use pandas exclusively.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.