0

I have a csv file like this:

-Species-    -Strain-       -A-       -B-       -C-       -D-
 Species1    Strain1.1         0.2       0.1       0.1       0.4
 Species1    Strain1.1         0.2       0.7       0.2       0.2
 Species1    Strain1.2         0.1       0.6       0.1       0.3
 Species1    Strain1.1         0.2       0.6       0.2       0.6
 Species2    Strain2.1         0.3       0.3       0.3       0.1
 Species2    Strain2.2         0.6       0.2       0.6       0.2
 Species2    Strain2.2         0.2       0.1       0.4       0.2

And I would like to calculate a mean (average) for each unique strain for each of the columns (A-D) how would I go about doing it?

I tried df.groupby(['Strain','Species']).mean().mean(1) but that still seems to give me multiple versions of strains in the resulting dataframe, rather than the means for each columns for each unique strain.

Essentially I would like a mean result for A,B,C & D per strain.

Apologies for being unclear, I'm struggling to get my head around this, and I'm very new to programming!

3
  • 2
    it's still not very clear - do you care about the grouping of the strain to species? If not then you can do df.groupby(['Strain']).mean() this will give you the mean of A,B,C,D per strain. Commented Apr 10, 2018 at 15:55
  • if that's not the case can you edit your question to include the expected results please. Commented Apr 10, 2018 at 15:56
  • Yes this works when both 'Strain' and 'Species' are used, thanks! Commented Apr 11, 2018 at 8:45

1 Answer 1

1

IIUC, you simply need to call

df.groupby(['Species', 'Strain']).mean()

                      A         B         C    D 
Species   Strain                               
Species1  Strain1.1  0.2  0.466667  0.166667  0.4
          Strain1.2  0.1  0.600000  0.100000  0.3
Species2  Strain2.1  0.3  0.300000  0.300000  0.1
          Strain2.2  0.4  0.150000  0.500000  0.2

What you were doing when you called df.groupby(['Strain','Species']).mean().mean(1) was taking the mean of the 4 means in A, B, C, and D. mean(1) means take the mean over the first axis (i.e. over the columns).

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, although when I do this I still get multiple results for the same strain, so I'll get, for example, strain 1.1 repeating multiple times when I only want the average for all the copies of strain 1.1
Nevermind! I wasn't saving to a new dataframe, Using df2 = df.groupby(['Species', 'Strain']).mean() Worked a treat, thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.