3

I have this data frame

    Id  Timestamp               Data    Group
0   1   2013-08-12 10:29:19.673 40.0    1
1   2   2013-08-13 10:29:20.687 50.0    2
2   3   2013-09-14 10:29:20.687 40.0    3
3   4   2013-10-14 10:29:20.687 30.0    4
4   5   2013-11-15 10:29:20.687 50.0    5
                    ...

I could plot a single graph using

import plotly.express as px
df1 = df[df['Group'] ==1]
fig = px.line(df1, 'Timestamp', 'Data',width=1000, height=500)
fig.show()

Then I want to group the data by Group column and plot a graph for each unique Group. I used

import plotly.express as px
df1 = df.groupby(df['Group'])
fig = px.line(df1, 'Timestamp', 'Data',width=1000, height=500)
fig.show()

and incurred error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-f8ccd9a83ce9> in <module>()
      2 df1 = df.groupby(df['Group'])
      3 
----> 4 fig = px.line(df1, 'Timestamp', 'Data',width=1000, height=500)
      5 fig.show()

4 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in _make_wrapper(self, name)
    602                 "using the 'apply' method".format(kind, name, type(self).__name__)
    603             )
--> 604             raise AttributeError(msg)
    605 
    606         self._set_group_selection()

AttributeError: Cannot access attribute 'columns' of 'DataFrameGroupBy' objects, try using the 'apply' method

I referred to a few similar posts and tried a few things but didn't work. How should I do this? Thanks

5
  • Thank you for accepting my answer! I take it this is what you were looking for? Commented Nov 29, 2019 at 12:39
  • Thanks for the succinct code. It's what I was looking for, although it crashed my google colab. I had to narrow down to filtering a few groups because there are quite a number of them in total Commented Nov 29, 2019 at 12:42
  • OK, is there anything you're missing then in my suggestion? Commented Nov 29, 2019 at 13:06
  • @vestland Well I think your answer is good enough for this question. I'll have to figure out a solution for GPU which is a separate issue.. Commented Nov 29, 2019 at 13:15
  • @vestland All I need to do is to reduce the number of groups to visualise at a time so it should be fine:) Commented Nov 29, 2019 at 13:17

2 Answers 2

2

You've tagged the question with plotly and only gotten a matplotlib answer so far, so here's a plotly approach:


In your provided data sample, there are no duplicate values for 'Group', but the timestamp seems to be continous. Your question is good though, but your dataset does not make a reasonable foundation for what you're trying to do, particularly not if you don't want to aggregate grouped values in any way. Therefore I'm assuming that what you're really working on here is actually more like:

    Id  Timestamp               Data    Group
0   1   2013-08-12 10:29:19.673 40.0    1
1   2   2013-08-13 10:29:20.687 50.0    1
2   3   2013-09-14 10:29:20.687 40.0    1
3   4   2013-10-14 10:29:20.687 30.0    1
4   5   2013-11-15 10:29:20.687 50.0    1
5   6   2013-08-12 10:29:19.673 60.0    2
6   7   2013-08-13 10:29:20.687 70.0    2
7   8   2013-09-14 10:29:20.687 60.0    2
8   9   2013-10-14 10:29:20.687 40.0    2
9   10   2013-11-15 10:29:20.687 60.0    2
10   11   2013-08-12 10:29:19.673 80.0    3
11   12   2013-08-13 10:29:20.687 100.0    3
12   13   2013-09-14 10:29:20.687 80.0    3
13   14   2013-10-14 10:29:20.687 60.0    3
14   15   2013-11-15 10:29:20.687 100.0    3

If this is the case, you could chose to plot each group over the same timestamps to get a:

Plot wiht pivoted values to get one trace per group:

enter image description here

Or you could chose to make an individual plot for each group like this:

Plot with grouped data to get one figure per group:

enter image description here


Complete code snippets:


Code wiht pivoted values to get one trace per group:

import pandas as pd
import plotly.graph_objects as go

df= pd.DataFrame({'Id': {(0, 1): '2013-08-12',
          (1, 2): '2013-08-13',
          (2, 3): '2013-09-14',
          (3, 4): '2013-10-14',
          (4, 5): '2013-11-15',
          (5, 6): '2013-08-12',
          (6, 7): '2013-08-13',
          (7, 8): '2013-09-14',
          (8, 9): '2013-10-14',
          (9, 10): '2013-11-15',
          (10, 11): '2013-08-12',
          (11, 12): '2013-08-13',
          (12, 13): '2013-09-14',
          (13, 14): '2013-10-14',
          (14, 15): '2013-11-15'},
         'Timestamp': {(0, 1): '10:29:19.673',
          (1, 2): '10:29:20.687',
          (2, 3): '10:29:20.687',
          (3, 4): '10:29:20.687',
          (4, 5): '10:29:20.687',
          (5, 6): '10:29:19.673',
          (6, 7): '10:29:20.687',
          (7, 8): '10:29:20.687',
          (8, 9): '10:29:20.687',
          (9, 10): '10:29:20.687',
          (10, 11): '10:29:19.673',
          (11, 12): '10:29:20.687',
          (12, 13): '10:29:20.687',
          (13, 14): '10:29:20.687',
          (14, 15): '10:29:20.687'},
         'Data': {(0, 1): 40.0,
          (1, 2): 50.0,
          (2, 3): 40.0,
          (3, 4): 30.0,
          (4, 5): 50.0,
          (5, 6): 60.0,
          (6, 7): 70.0,
          (7, 8): 60.0,
          (8, 9): 40.0,
          (9, 10): 60.0,
          (10, 11): 80.0,
          (11, 12): 100.0,
          (12, 13): 80.0,
          (13, 14): 60.0,
          (14, 15): 100.0},
         'Group': {(0, 1): 1,
          (1, 2): 1,
          (2, 3): 1,
          (3, 4): 1,
          (4, 5): 1,
          (5, 6): 2,
          (6, 7): 2,
          (7, 8): 2,
          (8, 9): 2,
          (9, 10): 2,
          (10, 11): 3,
          (11, 12): 3,
          (12, 13): 3,
          (13, 14): 3,
          (14, 15): 3}})

# pivot values to get one trace per group
dfp = pd.pivot_table(df,
                     values='Data',
                     index=['Timestamp'],
                     columns=['Group'],
               )
dfp.tail()

# plotly
fig = go.Figure()
for col in dfp.columns:
    fig.add_trace(go.Scatter(x=dfp.index, y=dfp[col], name='Group_'+str(col)))

fig.show()

Code with grouped data to get one figure per group:

# imports
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# data
df= pd.DataFrame({'Id': {(0, 1): '2013-08-12',
          (1, 2): '2013-08-13',
          (2, 3): '2013-09-14',
          (3, 4): '2013-10-14',
          (4, 5): '2013-11-15',
          (5, 6): '2013-08-12',
          (6, 7): '2013-08-13',
          (7, 8): '2013-09-14',
          (8, 9): '2013-10-14',
          (9, 10): '2013-11-15',
          (10, 11): '2013-08-12',
          (11, 12): '2013-08-13',
          (12, 13): '2013-09-14',
          (13, 14): '2013-10-14',
          (14, 15): '2013-11-15'},
         'Timestamp': {(0, 1): '10:29:19.673',
          (1, 2): '10:29:20.687',
          (2, 3): '10:29:20.687',
          (3, 4): '10:29:20.687',
          (4, 5): '10:29:20.687',
          (5, 6): '10:29:19.673',
          (6, 7): '10:29:20.687',
          (7, 8): '10:29:20.687',
          (8, 9): '10:29:20.687',
          (9, 10): '10:29:20.687',
          (10, 11): '10:29:19.673',
          (11, 12): '10:29:20.687',
          (12, 13): '10:29:20.687',
          (13, 14): '10:29:20.687',
          (14, 15): '10:29:20.687'},
         'Data': {(0, 1): 40.0,
          (1, 2): 50.0,
          (2, 3): 40.0,
          (3, 4): 30.0,
          (4, 5): 50.0,
          (5, 6): 60.0,
          (6, 7): 70.0,
          (7, 8): 60.0,
          (8, 9): 40.0,
          (9, 10): 60.0,
          (10, 11): 80.0,
          (11, 12): 100.0,
          (12, 13): 80.0,
          (13, 14): 60.0,
          (14, 15): 100.0},
         'Group': {(0, 1): 1,
          (1, 2): 1,
          (2, 3): 1,
          (3, 4): 1,
          (4, 5): 1,
          (5, 6): 2,
          (6, 7): 2,
          (7, 8): 2,
          (8, 9): 2,
          (9, 10): 2,
          (10, 11): 3,
          (11, 12): 3,
          (12, 13): 3,
          (13, 14): 3,
          (14, 15): 3}})

dfp = pd.pivot_table(df,
                     values='Data',
                     index=['Timestamp'],
                     columns=['Group'],
               )

# data dimensions
nrows = len(dfp.columns)

fig = make_subplots(rows=nrows,
                    cols=1,
                    subplot_titles=['Group'+str(c) for c in dfp.columns])

# add traces
x = 1
for i, col in enumerate(dfp.columns):
    fig.add_trace(go.Scatter(x=dfp.index, y=dfp[col].values,
                             name = 'Group_'+str(col),
                             mode = 'lines',
                             ),
                  row=i+1,
                  col=1)

fig.update_layout(height=nrows*200)

fig.show()
Sign up to request clarification or add additional context in comments.

Comments

1

The problem you are having is because of the object you are generating from from your groupby(). Which is a DataFrameGroupBy. This happens you are not passing any column to perform the aggregation on, neither are you passing the function to us as agg.

Depending on what you want to specifically do, you should resolve the issue with your groupby() first. An example that should work is:

import plotly.express as px
df1 = df.groupby(['Group'])['Data'].sum()
fig = px.line(df1, 'Timestamp', 'Data',width=1000, height=500)
fig.show()

Of course then adapt this code to fit your needs, or specify them in the question/comment and I'll edit this post accordingly.

EDIT:

Based on the discussion in the comments I offer 2 different solutions:

1) This is a simpler, yet harder to individualize the parameters:

import pandas as pd 
import matplotlib.pyplot as plt
data = {'group':[1,2,3,4,5,1,2,3,4,5],'Timestamp':['2013-08-12','2013-08-13','2013-08-14','2013-08-15','2013-08-16','2013-08-17','2013-08-18','2013-08-18','2013-08-19','2013-08-19'],'Data':[40,50,40,30,50,20,20,10,40,30]}
df = pd.DataFrame(data)

for i in df['group'].value_counts().reset_index()['index'].tolist():
    plt.plot('Timestamp','Data',data=df[df['group'] == i],marker='',color='red')
plt.show()

2) This is the longer option, but very customizable:

import pandas as pd 
import matplotlib.pyplot as plt
data = {'group':[1,2,3,4,5,1,2,3,4,5],'Timestamp':['2013-08-12','2013-08-13','2013-08-14','2013-08-15','2013-08-16','2013-08-17','2013-08-18','2013-08-18','2013-08-19','2013-08-19'],'Data':[40,50,40,30,50,20,20,10,40,30]}
df = pd.DataFrame(data)

plt.plot('Timestamp','Data',data=df[df['group'] ==1],marker='',color='blue')
plt.plot('Timestamp','Data',data=df[df['group'] ==2],marker='',color='red')
plt.plot('Timestamp','Data',data=df[df['group'] ==3],marker='',color='green')
plt.plot('Timestamp','Data',data=df[df['group'] ==4],marker='',color='yellow')
plt.plot('Timestamp','Data',data=df[df['group'] ==5],marker='',color='olive')
plt.show()

9 Comments

Thanks for your answer. Is it possible that I do not perform any aggregation(ie. mean, sum etc) but only keep the original data and plot multiple graphs? The purpose of grouping is not to calculate any function but to have separate graphs for individual Group id
You need to use an aggregation function, otherwise you would only get an array of the grouped values, but what do you want to plot as y-axis? You can use count for aggregation aswell, but if you are not specifying any, then nothing will be plotted.
y-axis would still be Data, just that instead of plotting all data in the dataset in one graph, I want to group the dataset by Group id first so I can plot a graph for each individual Group id separately
So, it seems you need a different approach, you are saying you would like to create as many dataframes as different Group values are, and then plot each dataframe, with Data as y-axis, but then what will be your x-axis ? Or do you want as many lines as groups, with y-axis as Data and x-axis as Timestamp?
Yes. That's exactly what I want. I am sorry if I had phrased the problem inappropriately
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.