2

I would like to produce plots (or subplots) of grouped data of a pandas dataframe. I think this should be something basic--I'm just missing something. I have input data that comes stacked that I prepare from the data example provided below. I would like to produce a chart like the following for each upperLevel of data:

Example plot for upperLevel data A

Here, I have some example data (I pasted the example .csv data I'm using here below). This data comes in a 'stacked' form of datainformation,time,data. Data information describes the the category and subcategories of a particular data point.

import pandas as pd
import re
import matplotlib.pyplot as plt

df=pd.read_csv('.....TestData.csv',index_col='T')
df=df.stack(0).reset_index(1)
df.columns=['fullType','data']
#And at this point, this is pretty much the form of my actual data

#So I split it up a bit to try to get columns for different data groupings
regexStr='~'

def upperParser(row):
    label=re.split(regexStr,row['fullType'])
    return label[1]
def lowerParser(row):
    label=re.split(regexStr,row['fullType'])
    return label[2]

df['upperLevel']=df.apply(upperParser,axis=1)
df['lowerLevel']=df.apply(lowerParser,axis=1)
df['time']=df.index


df=df.reset_index(drop=True)

plt.figure();
df.plot();

#And here is one of many attempts... I just seem to be missing something that should be simple:

for grp in df.groupby('upperLevel'):
for key,grp in df.groupby('lowerLevel'):
    plt.plot(x='time',y=grp['data'],label=key)
plt.show()

Any direction is greatly appreciated. I'm not concerned about trying to keep any particular form. My eventual goal is to have a plot of all upperLevel categories (say A=(0,1), B=(0,2)) and use mpl3d to view the underlying subplots (like this, but with each subcategory 1,2,3 stacked as a subplot). But first things first I suppose.

Sample data:

T   Col~A~1~    Col~A~2~    Col~A~3~    Col~B~1~    Col~B~2~    Col~B~3~
1   1   0.5 0.5 0.5 0.25    0.25
1.5 2   1   1   1   0.5 0.5
2   3   1.5 0.5 1.5 0.75    0.25
2.5 4   2   1   2   1   0.5
3   5   2.5 0.5 2.5 1.25    0.25
3.5 6   3   1   3   1.5 0.5
4   7   3.5 0.5 3.5 1.75    0.25
4.5 8   4   1   4   2   0.5
5   9   4.5 0.5 4.5 2.25    0.25
5.5 10  5   1   5   2.5 0.5
6   11  5.5 0.5 5.5 2.75    0.25
6.5 12  6   1   6   3   0.5
7   13  6.5 0.5 6.5 3.25    0.25
7.5 14  7   1   7   3.5 0.5
8   15  7.5 0.5 7.5 3.75    0.25
8.5 16  8   1   8   4   0.5
9   17  8.5 0.5 8.5 4.25    0.25
9.5 18  9   1   9   4.5 0.5
10  19  9.5 0.5 9.5 4.75    0.25

2 Answers 2

2

A few tips:

  • df.groupby() returns (group_name, group) tuples, so be careful of that when trying to iterate through the groups.
  • Generally you don't want to use pyplot manually if your desired plot is covered by pandas plotting methods.
  • pandas plotting methods will generally produce a separate line for each column in the dataframe you're plotting, so if you can rearrange your data to get your data sources in separate columns, you can easily get the plot you want.
  • pandas plotting methods will use the index of your dataframe as the x axis by default.

That said, you can produce your desired plots with:

for group_name, grp in df.groupby('upperLevel'):
    plot_table = grp.pivot(index='time', columns='lowerLevel', values='data')
    plot_table.plot()
Sign up to request clarification or add additional context in comments.

1 Comment

Great solution and helpful tips. I was definitely over thinking it.
1

I agree that this is a useful thing to want to do. I wish Pandas had a more advanced subplot function to make subplots by row groups as well as by columns.

Here is a function to do that, you can try it:

def subplotter(df):
    numcols = list(df.select_dtypes(include=['number']).columns)
    objcols = list(df.select_dtypes(include=['object']).columns)
    grouped = df.groupby(objcols)
    l = len(grouped.groups)
    cols = dict({1:1,2:1,3:1,4:2,5:2,6:2}, **{e:3 for e in range(7,25,1)})[l]
    rows = np.ceil(l/(cols*1.0))
    i, fig = 1, plt.figure(figsize=(5*cols,4*rows))
    for name, group in grouped:
        ax = fig.add_subplot(rows, cols, i)
        plt.plot(group[numcols])
        plt.legend(numcols)
        plt.title(', '.join([': '.join(e) for e in zip(objcols, name)]))
        plt.legend(numcols)
        i += 1
    plt.tight_layout()
    return

This function will group the DataFrame by all object type columns, making subplots for each. All number type columns get put in each subplot.

The complexity I've added is to determine a good size for the figure, the locations of the subplots (rows and cols) and to add a legend and titles.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.