Python Pandas Subplot with Stacked data

Question

I would like to produce plots (or subplots) of grouped data of a pandas dataframe. I think this should be something basic--I'm just missing something. I have input data that comes stacked that I prepare from the data example provided below. I would like to produce a chart like the following for each upperLevel of data:

Example plot for upperLevel data A

Here, I have some example data (I pasted the example .csv data I'm using here below). This data comes in a 'stacked' form of datainformation,time,data. Data information describes the the category and subcategories of a particular data point.

import pandas as pd
import re
import matplotlib.pyplot as plt

df=pd.read_csv('.....TestData.csv',index_col='T')
df=df.stack(0).reset_index(1)
df.columns=['fullType','data']
#And at this point, this is pretty much the form of my actual data

#So I split it up a bit to try to get columns for different data groupings
regexStr='~'

def upperParser(row):
    label=re.split(regexStr,row['fullType'])
    return label[1]
def lowerParser(row):
    label=re.split(regexStr,row['fullType'])
    return label[2]

df['upperLevel']=df.apply(upperParser,axis=1)
df['lowerLevel']=df.apply(lowerParser,axis=1)
df['time']=df.index


df=df.reset_index(drop=True)

plt.figure();
df.plot();

#And here is one of many attempts... I just seem to be missing something that should be simple:

for grp in df.groupby('upperLevel'):
for key,grp in df.groupby('lowerLevel'):
    plt.plot(x='time',y=grp['data'],label=key)
plt.show()

Any direction is greatly appreciated. I'm not concerned about trying to keep any particular form. My eventual goal is to have a plot of all upperLevel categories (say A=(0,1), B=(0,2)) and use mpl3d to view the underlying subplots (like this, but with each subcategory 1,2,3 stacked as a subplot). But first things first I suppose.

Sample data:

T   Col~A~1~    Col~A~2~    Col~A~3~    Col~B~1~    Col~B~2~    Col~B~3~
1   1   0.5 0.5 0.5 0.25    0.25
1.5 2   1   1   1   0.5 0.5
2   3   1.5 0.5 1.5 0.75    0.25
2.5 4   2   1   2   1   0.5
3   5   2.5 0.5 2.5 1.25    0.25
3.5 6   3   1   3   1.5 0.5
4   7   3.5 0.5 3.5 1.75    0.25
4.5 8   4   1   4   2   0.5
5   9   4.5 0.5 4.5 2.25    0.25
5.5 10  5   1   5   2.5 0.5
6   11  5.5 0.5 5.5 2.75    0.25
6.5 12  6   1   6   3   0.5
7   13  6.5 0.5 6.5 3.25    0.25
7.5 14  7   1   7   3.5 0.5
8   15  7.5 0.5 7.5 3.75    0.25
8.5 16  8   1   8   4   0.5
9   17  8.5 0.5 8.5 4.25    0.25
9.5 18  9   1   9   4.5 0.5
10  19  9.5 0.5 9.5 4.75    0.25

Marius · Accepted Answer · 2014-08-04 00:08:48Z

2

A few tips:

df.groupby() returns (group_name, group) tuples, so be careful of that when trying to iterate through the groups.
Generally you don't want to use pyplot manually if your desired plot is covered by pandas plotting methods.
pandas plotting methods will generally produce a separate line for each column in the dataframe you're plotting, so if you can rearrange your data to get your data sources in separate columns, you can easily get the plot you want.
pandas plotting methods will use the index of your dataframe as the x axis by default.

That said, you can produce your desired plots with:

for group_name, grp in df.groupby('upperLevel'):
    plot_table = grp.pivot(index='time', columns='lowerLevel', values='data')
    plot_table.plot()

answered Aug 4, 2014 at 0:08

Marius

60.6k16 gold badges115 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Docuemada Over a year ago

Great solution and helpful tips. I was definitely over thinking it.

KieranPC · Accepted Answer · 2015-05-03 20:01:30Z

I agree that this is a useful thing to want to do. I wish Pandas had a more advanced subplot function to make subplots by row groups as well as by columns.

Here is a function to do that, you can try it:

def subplotter(df):
    numcols = list(df.select_dtypes(include=['number']).columns)
    objcols = list(df.select_dtypes(include=['object']).columns)
    grouped = df.groupby(objcols)
    l = len(grouped.groups)
    cols = dict({1:1,2:1,3:1,4:2,5:2,6:2}, **{e:3 for e in range(7,25,1)})[l]
    rows = np.ceil(l/(cols*1.0))
    i, fig = 1, plt.figure(figsize=(5*cols,4*rows))
    for name, group in grouped:
        ax = fig.add_subplot(rows, cols, i)
        plt.plot(group[numcols])
        plt.legend(numcols)
        plt.title(', '.join([': '.join(e) for e in zip(objcols, name)]))
        plt.legend(numcols)
        i += 1
    plt.tight_layout()
    return

This function will group the DataFrame by all object type columns, making subplots for each. All number type columns get put in each subplot.

The complexity I've added is to determine a good size for the figure, the locations of the subplots (rows and cols) and to add a legend and titles.

Collectives™ on Stack Overflow

Python Pandas Subplot with Stacked data

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related