1

I am trying to add missing dates to my dataframe.

I have seen this posts: reindex and reindex2.

When I try to reindex my dataframe:

print(df)
df = df.reindex(dates, fill_value=0)
print(df)

I get the following output:

_updated_at         Name        hour day date       time      data1     data2
06/06/2016 13:27    game_name   13  6    06/06/2016 evening   0         0
07/06/2016 10:33    game_name   10  7    07/06/2016 morning   145.2788  122.7361
18/10/2016 14:34    game_name   14  18   18/10/2016 evening   0         0
19/10/2016 17:12    game_name   17  19   19/10/2016 evening   0         0
24/10/2016 11:05    game_name   11  24   24/10/2016 morning   313.5954  364.4107
24/10/2016 12:02    game_name   12  24   24/10/2016 evening   0         0
25/10/2016 08:50    game_name   8   25   25/10/2016 morning   362.4682  431.5803
25/10/2016 13:00    game_name   13  25   25/10/2016 evening   0         0


_updated_at Name hour day date  time data1  data2
24/10/2016  0    0    0   0     0    0      0
25/10/2016  0    0    0   0     0    0      0
26/10/2016  0    0    0   0     0    0      0
27/10/2016  0    0    0   0     0    0      0
28/10/2016  0    0    0   0     0    0      0
29/10/2016  0    0    0   0     0    0      0
30/10/2016  0    0    0   0     0    0      0

I am expecting to see the rows where a date is missing filled with the new row and 0's in each value, rather than all rows replaced with 0.

EDIT: The overall goal is to be able to calculate the difference between values resulting in a morning and evening diff on a per day basis.

EDIT2: Current output:

print (df.reindex(mux, fill_value=0).groupby(level=0)['data1'].diff(-1).dropna())

dtypes: float64(2)None
2016-06-06  morning       0.00000
2016-06-07  morning     440.99582
2016-06-08  morning       0.00000
2016-06-09  morning       0.00000
2016-06-10  morning       0.00000

print (df.reindex(mux, fill_value=0).groupby(level=0)['data2'].diff(-1).dropna())

Length: 142, dtype: float64
2016-06-06  morning    -220.5481
2016-06-07  morning       0.0000
2016-06-08  morning       0.0000
2016-06-09  morning       0.0000
2016-06-10  morning       0.0000
2016-06-11  morning       0.0000

I was expecting to see evening values

1 Answer 1

1

You can reindex by MultiIndex.from_product from columns dates and time:

df.date = pd.to_datetime(df.date)
dates = pd.date_range(start=df.date.min(), end=df.date.max())
print (dates)
DatetimeIndex(['2016-06-06', '2016-06-07', '2016-06-08', '2016-06-09',
               '2016-06-10', '2016-06-11', '2016-06-12', '2016-06-13',
               '2016-06-14', '2016-06-15',
               ...
               '2016-10-16', '2016-10-17', '2016-10-18', '2016-10-19',
               '2016-10-20', '2016-10-21', '2016-10-22', '2016-10-23',
               '2016-10-24', '2016-10-25'],
              dtype='datetime64[ns]', length=142, freq='D')

mux = pd.MultiIndex.from_product([dates,['morning','evening']])
#print (mux)

df.set_index(['date','time'], inplace=True)

print (df.reindex(mux, fill_value=0))
                         _updated_at       Name  hour  day     data1     data2
2016-06-06 morning                 0          0     0    0    0.0000    0.0000
           evening  06/06/2016 13:27  game_name    13    6    0.0000    0.0000
2016-06-07 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-08 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-09 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-10 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-11 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-12 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-13 morning                 0          0     0    0    0.0000    0.0000
...

Last you can groupby by first level of Multiindex (dates) with DataFrameGroupBy.diff. You get for each dates row with NaN which can be removed by dropna:

print (df.reindex(mux, fill_value=0).groupby(level=0)['data1','data2'].diff(-1).dropna())
                       data1     data2
2016-06-06 morning    0.0000    0.0000
2016-06-07 morning    0.0000    0.0000
2016-06-08 morning    0.0000    0.0000
2016-06-09 morning    0.0000    0.0000
2016-06-10 morning    0.0000    0.0000
2016-06-11 morning    0.0000    0.0000
2016-06-12 morning    0.0000    0.0000
2016-06-13 morning    0.0000    0.0000
2016-06-14 morning    0.0000    0.0000
2016-06-15 morning    0.0000    0.0000
2016-06-16 morning    0.0000    0.0000
2016-06-17 morning    0.0000    0.0000
2016-06-18 morning    0.0000    0.0000
2016-06-19 morning    0.0000    0.0000
2016-06-20 morning    0.0000    0.0000
2016-06-21 morning    0.0000    0.0000
...
...

You can also select by ix and subtract:

print (df.reindex(mux, fill_value=0)
         .groupby(level=0)
         .apply(lambda x: x.ix[0, ['data1','data2']]-x.ix[1, ['data1','data2']]))

               data1     data2
2016-06-06    0.0000    0.0000
2016-06-07    0.0000    0.0000
2016-06-08    0.0000    0.0000
2016-06-09    0.0000    0.0000
2016-06-10    0.0000    0.0000
2016-06-11    0.0000    0.0000
2016-06-12    0.0000    0.0000
2016-06-13    0.0000    0.0000
2016-06-14    0.0000    0.0000
2016-06-15    0.0000    0.0000
2016-06-16    0.0000    0.0000
2016-06-17    0.0000    0.0000
2016-06-18    0.0000    0.0000
2016-06-19    0.0000    0.0000
2016-06-20    0.0000    0.0000
2016-06-21    0.0000    0.0000
2016-06-22    0.0000    0.0000
2016-06-23    0.0000    0.0000
2016-06-24    0.0000    0.0000
2016-06-25    0.0000    0.0000
2016-06-26    0.0000    0.0000
2016-06-27    0.0000    0.0000
2016-06-28    0.0000    0.0000
2016-06-29    0.0000    0.0000
2016-06-30    0.0000    0.0000
2016-07-01    0.0000    0.0000
2016-07-02    0.0000    0.0000
2016-07-03    0.0000    0.0000
2016-07-04    0.0000    0.0000
2016-07-05    0.0000    0.0000
             ...       ...
2016-09-26    0.0000    0.0000
2016-09-27    0.0000    0.0000
2016-09-28    0.0000    0.0000
2016-09-29    0.0000    0.0000
2016-09-30    0.0000    0.0000
2016-10-01    0.0000    0.0000
2016-10-02    0.0000    0.0000
2016-10-03    0.0000    0.0000
2016-10-04    0.0000    0.0000
2016-10-05    0.0000    0.0000
2016-10-06    0.0000    0.0000
2016-10-07    0.0000    0.0000
2016-10-08    0.0000    0.0000
2016-10-09    0.0000    0.0000
2016-10-10    0.0000    0.0000
2016-10-11    0.0000    0.0000
2016-10-12    0.0000    0.0000
2016-10-13    0.0000    0.0000
2016-10-14    0.0000    0.0000
2016-10-15    0.0000    0.0000
2016-10-16    0.0000    0.0000
2016-10-17    0.0000    0.0000
2016-10-18    0.0000    0.0000
2016-10-19    0.0000    0.0000
2016-10-20    0.0000    0.0000
2016-10-21    0.0000    0.0000
2016-10-22    0.0000    0.0000
2016-10-23    0.0000    0.0000
2016-10-24  313.5954  364.4107
2016-10-25  362.4682  431.5803

[142 rows x 2 columns]
Sign up to request clarification or add additional context in comments.

20 Comments

is there no way of ignoring duplicates when using reindex?
unfortunately i dont think the mean will work, you will see there is a column called 'time' which specifies if its in the morning or evening. i need to keep these rows separate for intraday calculations
Yes, this is another problem. It is called automatic exclusion of nuisance columns. So it replace only if numeric data. But question is if for some dates are morning and evening in time column, what is desired output after aggreagtion? Both values, or only first or only last?
the goal is to compute the difference of the values on an intraday basis. (current morning value = current morning value - previous afternoon data) I hope to be left with a morning diff value and evening diff value for each day. Therefore I would only keep the most recent record in the morning and the most recent in the evening.
Hmmm, I have an idea.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.