0

I have a data frame with few thousand rows. It looks something like below:

ID Amount Segment
1  23     A
2  43     B
3  65     A
4  23     A
5  86     C
6  54     B
7  432    B
8  987    A
9  43     C
10 46     C

At first I had to segregate data based on segment which I did:

df_A = df[(df['Segment'] == 'A')]
df_B = df[(df['Segment'] == 'B')]
df_C = df[(df['Segment'] == 'C')]

After doing so I had to perform some operations which included groupby and other functions. So I have to groupby in each of those subsets and perform operations for example as shown below:

df_A['days'] = (df_A['first'] - df_US['last']).dt.days
df_A_A = df_A[(df_A['days'] >= 0) & (df_A['days'] <= 30)]
A = df_A_A.groupby('days').user.nunique().reset_index()
A['user'] = A['user'].cumsum()

Now here I am creating two further data frames for each subset and finally plotting the dataframe A (B and C in other two subsets).

And in the end I had to plot for each set:

plt.plot(A['x'], A['y'], color='red', label='A')
plt.plot(B['x'], B['y'], color='blue', label='B')
plt.plot(C['x'], C['y'], color='green', label='C')

Now the problem is that I may have n number of segments and it would be easier to do all this operation inside one loop. Is that possible? I want to write the code for only one segment and then basically get the desired output for all the segments. I tried to group by segment in loop but not sure how to accommodate it so that it will also create df_A_A and A data frames in the process.

I am trying this and getting error:

for key, grp in df.groupby(['Segment']):




df_grp = df[(df['Segment'] == grp)]
df_grp['days'] = (df_grp['first'] - df_grp['last']).dt.days

df_grp_1 = df_grp[(df_grp['days'] >= 0) & (df_grp['days'] <= 30)]
grp = df_grp_1.groupby('days').user.nunique().reset_index()
grp['user'] = grp['user'].cumsum()


plt.plot(grp['days'], grp['conv'], color='key', label=key)
plt.legend()
plt.xlabel('days')
plt.ylabel('conv')
plt.show()

I am getting this error:

File "<ipython-input-6-7a4977f42a83>", line 10
    df_grp = df[(df['Segment'] == grp)]
         ^
IndentationError: expected an indented block

Thanks in advance!

8
  • you can groupby then plot check this stackoverflow.com/questions/41494942/… Commented Aug 24, 2021 at 3:14
  • @Epsi95 But I have other operations as well not just plot. Commented Aug 24, 2021 at 3:16
  • groupby is iterable as demonstrated in the second answer for key, grp in df.groupby('Segment'): Then in the loop body would be something like: plt.plot(grp['x'], grp['y'], label=key) Commented Aug 24, 2021 at 3:17
  • @HenryEcker Yes I understood this but this does not solve my problem where I am doing some other operations. Maybe I will have to edit and include those pointers as well. Commented Aug 24, 2021 at 3:27
  • Yes. The way your question currently reads a loop over groupby addresses the issue. The loop produces a dataframe for each unique value in Segment which is exactly the same as subsetting manually with an index selection. You can perform any and all dataframe operations needed on grp before plotting. Commented Aug 24, 2021 at 3:31

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.