Calculating means for multiple columns, in different rows in pandas

Question

I have a csv file like this:

-Species-    -Strain-       -A-       -B-       -C-       -D-
 Species1    Strain1.1         0.2       0.1       0.1       0.4
 Species1    Strain1.1         0.2       0.7       0.2       0.2
 Species1    Strain1.2         0.1       0.6       0.1       0.3
 Species1    Strain1.1         0.2       0.6       0.2       0.6
 Species2    Strain2.1         0.3       0.3       0.3       0.1
 Species2    Strain2.2         0.6       0.2       0.6       0.2
 Species2    Strain2.2         0.2       0.1       0.4       0.2

And I would like to calculate a mean (average) for each unique strain for each of the columns (A-D) how would I go about doing it?

I tried df.groupby(['Strain','Species']).mean().mean(1) but that still seems to give me multiple versions of strains in the resulting dataframe, rather than the means for each columns for each unique strain.

Essentially I would like a mean result for A,B,C & D per strain.

Apologies for being unclear, I'm struggling to get my head around this, and I'm very new to programming!

it's still not very clear - do you care about the grouping of the strain to species? If not then you can do df.groupby(['Strain']).mean() this will give you the mean of A,B,C,D per strain. — gyx-hh
– gyx-hh, Commented Apr 10, 2018 at 15:55
if that's not the case can you edit your question to include the expected results please. — gyx-hh
– gyx-hh, Commented Apr 10, 2018 at 15:56
Yes this works when both 'Strain' and 'Species' are used, thanks! — Biomage
– Biomage, Commented Apr 11, 2018 at 8:45

sacuL · Accepted Answer · 2018-04-10 20:17:19Z

1

IIUC, you simply need to call

df.groupby(['Species', 'Strain']).mean()

                      A         B         C    D 
Species   Strain                               
Species1  Strain1.1  0.2  0.466667  0.166667  0.4
          Strain1.2  0.1  0.600000  0.100000  0.3
Species2  Strain2.1  0.3  0.300000  0.300000  0.1
          Strain2.2  0.4  0.150000  0.500000  0.2

What you were doing when you called df.groupby(['Strain','Species']).mean().mean(1) was taking the mean of the 4 means in A, B, C, and D. mean(1) means take the mean over the first axis (i.e. over the columns).

edited Apr 10, 2018 at 20:17

answered Apr 10, 2018 at 15:59

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Biomage Over a year ago

Thanks, although when I do this I still get multiple results for the same strain, so I'll get, for example, strain 1.1 repeating multiple times when I only want the average for all the copies of strain 1.1

Biomage Over a year ago

Nevermind! I wasn't saving to a new dataframe, Using df2 = df.groupby(['Species', 'Strain']).mean() Worked a treat, thanks!

Collectives™ on Stack Overflow

Calculating means for multiple columns, in different rows in pandas

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related