Plotting multiple columns groupby on multiple plots

Question

I have data that looks like this

ID    value_y   date_x      end_cutoff
1      75     2020-7-1      2021-01-17
1      73     2020-7-2      2021-01-17
1      74     2020-7-1      2021-06-05
1      71     2020-7-2      2021-06-05
2      111    2020-7-1      2021-01-17
2      112    2020-7-2      2021-01-17
2      113    2020-7-1      2021-06-05
2      115    2020-7-2      2021-06-05

And I want to plot the following data such that the following are met:

Each ID has 1 graph
Each graph has n lines plotted (2 in this example; 1 for each end_cutoff)

So, ideally in this example I would have two separate plots both with two lines.

Currently here is the code I have but it plots them all but on the same plot instead of a new plot for each ID.

 grouped = df_fit.groupby(['ID','end_cutoff'])
 fig, ax = plt.subplots()
 for (ID, end_cutoff), df_fit in grouped:
     ax.plot(df_fit['date_x'], df_fit['value_y'], label=ID+' '+str(end_cutoff.date()))
 plt.show()

Trenton McKinney · Accepted Answer · 2021-06-24 18:32:24Z

This solution adds the missing pieces into your existing code

Format the date columns correctly to a datetime dtype, and extract only the date component.
Create a number of subplots equal to the number of unique 'ID' values
Get the index of ID within uid and use that value to index and plot to the correct ax

This option uses pandas.DataFrame.plot
The format of the x-axis is '%m-%d %H' because the time between points is small. The x-axis will auto format depending on the date range.

import pandas as pd
import numpy as np

# dataframe
data = {'ID': [1, 1, 1, 1, 2, 2, 2, 2], 'value_y': [75, 73, 74, 71, 111, 112, 113, 115], 'date_x': ['2020-7-1', '2020-7-2', '2020-7-1', '2020-7-2', '2020-7-1', '2020-7-2', '2020-7-1', '2020-7-2'], 'end_cutoff': ['2021-01-17', '2021-01-17', '2021-06-05', '2021-06-05', '2021-01-17', '2021-01-17', '2021-06-05', '2021-06-05']}
df = pd.DataFrame(data)

# set date columns to a datetime dtype and extract only the date component since time isn't relevant
df['end_cutoff'] =  pd.to_datetime(df['end_cutoff']).dt.date
df['date_x'] =  pd.to_datetime(df['date_x']).dt.date

# create grouped
grouped = df.groupby(['ID','end_cutoff'])

# create subplots based on the number of unique ID values
uid = df.ID.unique()
fig, ax = plt.subplots(nrows=len(uid), figsize=(7, 4))

for (ID, end_cutoff), df_fit in grouped:
    
    # get the index of the current ID, and use it to index ax
    axi = np.argwhere(uid==ID)[0][0]

    # plot to the correct ax based on the index of the ID
    df_fit.plot(x='date_x', y='value_y', ax=ax[axi], label=f'{ID} {end_cutoff}',
                xlabel='Date', ylabel='Value', title=f'ID: {ID}', marker='.', rot=30)

    # place the legend outside the plot
    ax[axi].legend(title='Cutoff', bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()

Collectives™ on Stack Overflow

Plotting multiple columns groupby on multiple plots

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related