5

I have a data.frame that looks like this (however with a larger number of columns and rows):

    Gene      Cell1    Cell2    Cell3     
1      A          2        7        8 
2      A          5        2        9 
3      B          2        7        8
4      C          1        4        3

I want to sum the rows that have the same value in Gene, in order to get something like this:

    Gene      Cell1    Cell2    Cell3     
1      A          7        9       17  
2      B          2        7        8
3      C          1        4        3

Based on the answers to previous questions, I've tried to use aggregate but I could not understand how I can get the above result. This is what I've tried:

aggregate(df[,-1], list(df[,1]), FUN = sum)

Does anyone have an idea of what I'm doing wrong?

1
  • what's wrong with the result you've got with aggregate? Commented May 28, 2017 at 17:56

2 Answers 2

6
aggregate(df[,-1], list(Gene=df[,1]), FUN = sum)
#   Gene Cell1 Cell2 Cell3
# 1    A     7     9    17
# 2    B     2     7     8
# 3    C     1     4     3

will give you the output you are looking for.

Sign up to request clarification or add additional context in comments.

4 Comments

There's an error, when we run the above: Error in aggregate.data.frame(df[, -1], list(Gene = df[, 1]), FUN = sum) : arguments must have same length
@ManojKumar Please add the output of str(df) to your post.
Sure @lukeA here it is : Classes ‘data.table’ and 'data.frame': 4 obs. of 4 variables: $ Gene : chr "A" "A" "B" "C" $ Cell1: int 2 5 2 1 $ Cell2: int 7 2 7 4 $ Cell3: int 8 9 8 3 - attr(*, ".internal.selfref")=<externalptr>
@ManojKumar thx. You got a data table object; indexing is a bit different there. So you could e.g. do aggregate(df[,-1], list(Gene=df[[1]]), FUN = sum). But if you got a data table anyway, you may want to use its strengths in aggregating data; df[, lapply(.SD, sum), by=Gene].
4

Or with dplyr:

library(dplyr)
df %>%
  group_by(Gene) %>%
  summarise_all(sum) %>%
  data.frame() -> newdf # so that newdf can further be used, if needed

1 Comment

the other methods work but this is more robust as well as intuitive. I like that one does not need to declare what columns to sum.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.