1

my initial dataframe is

df = pd.DataFrame({"a":["2020-01-01", "2020-01-06", "2020-01-04", "2020-01-07"],
                   "b":["a", "a", "b", "b"],
                   "c":[1, 2, 3,4]})
print(df)
            a  b  c
0  2020-01-01  a  1
1  2020-01-06  a  2
2  2020-01-04  b  3
3  2020-01-07  b  4

I want my dataset to be like this

            a  b  c
0  2020-01-01  a  1
1  2020-01-02  a  NaN
2  2020-01-03  a  NaN
3  2020-01-04  a  NaN
4  2020-01-05  a  NaN
5  2020-01-06  a  2
6  2020-01-04  b  3
7  2020-01-05  b  NaN
8  2020-01-06  b  NaN
3  2020-01-07  b  4

I tried

d.set_index([d.a, d.b], inplace=True)
d.asfreq("D")

d.set_index([d.a, d.b], inplace=True)
d.resample("D")

but I encountered

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
enter code here

My real DataFrame's column ('b' column in this example) has many unique values.

3 Answers 3

4
df = pd.DataFrame({"a":["2020-01-01", "2020-01-06", "2020-01-04", "2020-01-07"],
                   "b":["a", "a", "b", "b"],
                   "c":[1, 2, 3,4]})
# make datetime
df['a'] = pd.to_datetime(df['a'])
# create a group
g = df.groupby('b')
# list comprehension with reindex and date_range then concat list of frames
df2 = pd.concat([df.set_index('a').reindex(pd.date_range(df['a'].min(),
                                                         df['a'].max(),freq='D')) for _,df in g])
# ffill column b
df2['b'] = df2['b'].ffill()


            b    c
2020-01-01  a  1.0
2020-01-02  a  NaN
2020-01-03  a  NaN
2020-01-04  a  NaN
2020-01-05  a  NaN
2020-01-06  a  2.0
2020-01-04  b  3.0
2020-01-05  b  NaN
2020-01-06  b  NaN
2020-01-07  b  4.0
Sign up to request clarification or add additional context in comments.

1 Comment

How would you do that if you want a specific date delta, ie from 2020-01-01 to 2021-04-01 for each value of the column 'b'? I mean same dates (in that range) for a and b.
3

Another approach with groupby and asfreq:

(df.set_index('a')
   .groupby('b').apply(lambda x: x.drop('b',axis=1).asfreq('D'))
   .reset_index()
)

Output:

   b          a    c
0  a 2020-01-01  1.0
1  a 2020-01-02  NaN
2  a 2020-01-03  NaN
3  a 2020-01-04  NaN
4  a 2020-01-05  NaN
5  a 2020-01-06  2.0
6  b 2020-01-04  3.0
7  b 2020-01-05  NaN
8  b 2020-01-06  NaN
9  b 2020-01-07  4.0

Comments

0

We could use the complete function from pyjanitor, which provides a convenient abstraction to generate the missing rows :

# pip install pyjanitor
import pandas as pd
import janitor as jn

df['a'] = pd.to_datetime(df['a'])

# create a mapping for the new dates
dates = {"a" : lambda a : pd.date_range(a.min(), a.max(), freq='1D')}

# create the new dataframe, exposing the missing rows, per group:

df.complete(dates, by='b', sort = True)
 
           a  b    c
0 2020-01-01  a  1.0
1 2020-01-02  a  NaN
2 2020-01-03  a  NaN
3 2020-01-04  a  NaN
4 2020-01-05  a  NaN
5 2020-01-06  a  2.0
6 2020-01-04  b  3.0
7 2020-01-05  b  NaN
8 2020-01-06  b  NaN
9 2020-01-07  b  4.0

An alternative approach to bypass the by parameter, and possibly improve performance, is shown below :

# build the mapping for the entire dataframe, ignoring the groups
dates = {"a" : pd.date_range(df.a.min(), df.a.max(), freq='1D')}

# create a groupby object
grouped = df.groupby('b').a

 # create temporary columns for the min and max dates
(df.assign(date_min = grouped.transform('min'), 
           date_max = grouped.transform('max'))
    # expose the missing rows based on the combination of dates
    # and the tuple of b, date_min and date_max
   .complete(dates, ('b', 'date_min', 'date_max'))
    # filter for rows, this will keep only the relevant dates
   .loc[lambda df: df.a.between(df.date_min, df.date_max), 
        df.columns]
   .sort_values(['b', 'a'], ignore_index = True)
)

           a  b    c
0 2020-01-01  a  1.0
1 2020-01-02  a  NaN
2 2020-01-03  a  NaN
3 2020-01-04  a  NaN
4 2020-01-05  a  NaN
5 2020-01-06  a  2.0
6 2020-01-04  b  3.0
7 2020-01-05  b  NaN
8 2020-01-06  b  NaN
9 2020-01-07  b  4.0

Performance will be significant/better for large dataframes. A similar example with tests is offered here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.