1

I want to increase the date in one dataframe column by an integer value in another.

I receive TypeError: unsupported type for timedelta days component: numpy.int64

My dataframes look like this:

import pandas as pd
import numpy as np
import datetime as dt

dfa = pd.DataFrame([
    ['5/15/17',1],
    ['5/15/17',1]],
    columns = ['Start','Days'])

dfb = pd.DataFrame([
    ['5/15/17',1],
    ['5/15/17',1]],
    columns = ['Start','Days'])

I format the 'Start' column to datetime with this code:

dfa['Start'] = dfa['Start'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%m/%d/%y'))
dfb['Start'] = dfb['Start'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%m/%d/%y'))

I try to change the values in the dfa dataframe. The dfb dataframe reference works for 'Days' but not for 'Start':

for i, row in dfb.iterrows():
    for j, row in dfa.iterrows():
        new = pd.DataFrame({"Start": dfa.loc[j,"Start"] + datetime.timedelta(days=dfb.loc[i,"Days"]), "Days": dfa.loc[j,"Days"] - dfb.loc[i,"Days"]}, index = [j+1])
        dfa = pd.concat([dfa.ix[:j], new, dfa.ix[j+1:]]).reset_index(drop=True)

This is the key component that raises the error:

"Start": dfa.loc[j,"Start"] + datetime.timedelta(days=dfb.loc[i,"Days"]

It works fine if I use:

"Start": dfa.loc[j,"Start"] + datetime.timedelta(days=1)

but I need it to be taking that value from dfb, not a static integer.

1 Answer 1

1

IIUC (I changed the input values a bit to clarify what is going on):

import pandas as pd

dfa = pd.DataFrame([
    ['5/15/17',1],
    ['5/16/17',1]],
    columns = ['Start','Days'])

dfb = pd.DataFrame([
    ['5/15/17',3],
    ['5/16/17',4]],
    columns = ['Start','Days'])

dfa['Start'] = pd.to_datetime(dfa['Start'])

dfb['Start'] = pd.to_datetime(dfb['Start'])

dfa['Start'] = dfa['Start'] + dfb['Days'].apply(pd.Timedelta,unit='D')
print(dfa)

Output:

       Start  Days
0 2017-05-18     1
1 2017-05-20     1
Sign up to request clarification or add additional context in comments.

6 Comments

That seems like it should work and it does with your code, but when I place that in my actual code as dfb.loc[i,'Days'].apply(pd.Timedelta,unit='D'), it raises: AttributeError: 'numpy.int64' object has no attribute 'apply'.
Oh, but if I remove the ".loc[i," it does work. I was assuming I would need to keep that reference to which index I was on, but maybe that isn't the case?
One cool thing about pandas is that it intrinsically does things with index alignment. If dfa and dfb look alike then it will align on indexes. You can also, use set_index('Date') on both to get the alignment correct if dfa and dfb are in different sort orders.
And I choose to use Pandas object instead of datetime module objects, there are some minor difference, but for the most part Pandas uses the datetime module underneath.
In further testing, this solution does not quite seem to work. I think maybe because of which 'Days' value it is retaining in the loop and because I'm inserting a new row into the dataframe? Not sure how to show you this issue within the simplified code I posted, but for some instances it doesn't use the 'Days' value that I want and in other instances it returns 'NaT' in the Start column...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.