0
|submit_date |  approved_date|
------------------------------
|0  1/6/2021    |1/19/2021|
|1  1/5/2021    |1/5/2021|
|2  1/5/2021    |1/5/2021|
|3  1/6/2021    |1/7/2021|
|4  1/7/2021    |1/7/2021|

I uploaded a csv file that has over 200,000 records. using df=pd.read_csv() there are some empty columns and is that okay to fill them with zero? is that why am I getting this error?

date1=pd.Series(df[" Create Date"])
date2=pd.Series(df[" Issue Date"])

date_df = pd.DataFrame(dict(submit_date = date1, approved_date = date2))
date_df

I am able to see the above with this set of code. when I try to calculate the number of dates between the dates. I get "could not convert string to float: '1/6/2021'" when using

(df['Create Date']).apply(lambda x: float(x))

and " cannot convert the series to <class 'float'>" when I try to use below calculation

diff = (float(date1) - float(date2))
diff

can someone please help me to put the code together? Thanks

1
  • convert it to a datetime object then you can do simple math on the series - pd.to_datetime(...) then df['date1'] - df['date2'] Commented Jul 8, 2021 at 15:41

2 Answers 2

1

If you want to get the number of days between 2 columns of dates, you can do it this way:

  1. Convert the date columns from strings to datetime format first:
df['submit_date'] = pd.to_datetime(df['submit_date'], format='%m/%d/%Y')
df['approved_date'] = pd.to_datetime(df['approved_date'], format='%m/%d/%Y')

(added format string as suggested by @SMeznaric for faster conversion)

  1. Then, create a column diff with the differences in days by substracting one date from another and get the number of days by dt.days, as follows:
df['diff'] = (df['approved_date'] - df['submit_date']).dt.days

Result:

print(df)

  submit_date approved_date  diff
0  2021-01-06    2021-01-19    13
1  2021-01-05    2021-01-05     0
2  2021-01-05    2021-01-05     0
3  2021-01-06    2021-01-07     1
4  2021-01-07    2021-01-07     0
Sign up to request clarification or add additional context in comments.

5 Comments

This is the right answer. It's good practice to pass format argument to pd.to_datetime. In addition to avoiding errors in case of ambiguity, it is also much faster.
Thanks @SMeznaric I had also considered to add the format argument but seen the date format (month first) is already consistent to the default setting. Therefore, finally without adding it. If it were in day first format, I would certainly add format argument. Thanks for the reminder of the good practice anyway. :-)
There's no default actually, if you don't pass it pandas will try to infer the format which will be slow and can also lead to ambiguous behaviour.
@SMeznaric I concur that adding the format string can potentially speed up the conversion. Added format string in my solution above. To clarify, what I mentioned the default setting is about the month-first or day-first, year-first default setting. As you can see from the official doc of pd.to_datetime we have the defaults dayfirst=False, yearfirst=False. This implies that the default is actually month-first. That's why I didn't specify the format string as the sample dates are seen as month-first.
your answer helps. there is actually a issue with the way I imported the files. it reads my headers(raw 1) thats is why I was getting many errors. Thanks so much @SeaBean and @ SMeznaric
0

You can not parse dates with float() function, it expects a float number, not a date.

What you need to use is the strptime function from the datetime module:

For example:

import datetime
dt = datetime.datetime.strptime('1/6/2021','%m/%d/%Y')

In your case, change: (df['Create Date']).apply(lambda x: float(x)) to (df['Create Date']).apply(lambda x: datetime.datetime.strptime(x,'%m/%d/%Y')) and do the same operation in the other column.

Alternatively, you can use pandas to_datetime:

date1 = pd.to_datetime(date1, format='%m/%d/%y')
date2 = pd.to_datetime(date2, format='%m/%d/%y')

Then the difference of two dates can be obtained as float with the total_seconds() method.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.