5

I’ve a lot of DataFrames with 2 columns, like this:

Fecha unidades
0 2020-01-01 2.0
84048 2020-09-01 4.0
149445 2020-10-01 11.0
532541 2020-11-01 4.0
660659 2020-12-01 2.0
1515682 2021-03-01 9.0
1563644 2021-04-01 2.0
1759823 2021-05-01 1.0
2226586 2021-07-01 1.0

As it can be seen, there are some months that are missing. Missing data depends on the DataFrame, I can have 2 months, 10, 100% complete, only one...I need to complete column "Fecha" with missing months (from 2020-01-01 to 2021-12-01) and when date is added into "Fecha", add "0" value to "unidades" column.

Each element in Fecha Column is a class 'pandas._libs.tslibs.timestamps.Timestamp

How could I fill the missing dates for each DataFrame??

0

2 Answers 2

9

You could create a date range and use "Fecha" column to set_index + reindex to add missing months. Then fillna + reset_index fetches the desired outcome:

df['Fecha'] = pd.to_datetime(df['Fecha'])
df = (df.set_index('Fecha')
      .reindex(pd.date_range('2020-01-01', '2021-12-01', freq='MS'))
      .rename_axis(['Fecha'])
      .fillna(0)
      .reset_index())

Output:

        Fecha  unidades
0  2020-01-01       2.0
1  2020-02-01       0.0
2  2020-03-01       0.0
3  2020-04-01       0.0
4  2020-05-01       0.0
5  2020-06-01       0.0
6  2020-07-01       0.0
7  2020-08-01       0.0
8  2020-09-01       4.0
9  2020-10-01      11.0
10 2020-11-01       4.0
11 2020-12-01       2.0
12 2021-01-01       0.0
13 2021-02-01       0.0
14 2021-03-01       9.0
15 2021-04-01       2.0
16 2021-05-01       1.0
17 2021-06-01       0.0
18 2021-07-01       1.0
19 2021-08-01       0.0
20 2021-09-01       0.0
21 2021-10-01       0.0
22 2021-11-01       0.0
23 2021-12-01       0.0
Sign up to request clarification or add additional context in comments.

Comments

0

One option is with pyjanitor's complete function:

# pip install pyjanitor
import janitor
import pandas as pd

df = pd.read_clipboard()
df['Fecha'] = pd.to_datetime(df['Fecha'])
# create new variable containing all possible dates
fecha={"Fecha":pd.date_range('2020-01-01', '2021-12-01', freq='MS')}
df.complete(fecha, fill_value=0) 
        Fecha  unidades
0  2020-01-01         2
1  2020-02-01         0
2  2020-03-01         0
3  2020-04-01         0
4  2020-05-01         0
5  2020-06-01         0
6  2020-07-01         0
7  2020-08-01         0
8  2020-09-01         4
9  2020-10-01        11
10 2020-11-01         4
11 2020-12-01         2
12 2021-01-01         0
13 2021-02-01         0
14 2021-03-01         9
15 2021-04-01         2
16 2021-05-01         1
17 2021-06-01         0
18 2021-07-01         1
19 2021-08-01         0
20 2021-09-01         0
21 2021-10-01         0
22 2021-11-01         0
23 2021-12-01         0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.