2

I have a sample data:

SampleID  a      b     d     f       ca      k     l    cb
1         0.1    2     1     2       7       1     4    3
2         0.2    3     2     3       4       2     5    5
3         0.5    4     3     6       1       3     9    2

I need to find row-wise sum of columns which have something common in names, e.g. row-wise sum(a, ca) or row-wise sum(b,cb). The problem is that i have large data.frame and ideally i would be able to write what is common in column header, so that code would pick only those columns to sum

Thank you beforehand for any assistance.

2 Answers 2

2

We can select the columns that have 'a' with grep, subset the columns and do rowSums and the same with 'b' columns.

 rowSums(df1[grep('a', names(df1)[-1])+1])
 rowSums(df1[grep('b', names(df1)[-1])+1])
Sign up to request clarification or add additional context in comments.

15 Comments

Could it be modified so that it returns matrix, data.frame and position of columns is not +1 all the time. So, could be the code a bit more general.
I clicked! Thank you! But i don't have much reputation yet, it to be appeared. But i really did!
Thank you. But could you please explain me a bit more.How can i modify your code to sum rows of columns which are for example 7 columns away from each other?
@OlgaAnufrieva The grep gives the column index of the columns that have the same pattern in the column name. I added 1 to the grep output because I was grepping on the subset of dataset that doesn't include the first column. So, if I understand your comment, it should work.
Thanks a lot for the help!
|
1

If you want the output as a data frame, try using dplyr

# Recreating your sample data
df <- data.frame(SampleID = c(1, 2, 3),
             a = c(0.1, 0.2, 0.5),
             b = c(2, 3, 4),
             d = c(1, 2, 3),
             f = c(2, 3, 6),
             ca = c(7, 4, 1),
             k = c(1, 2, 3),
             l = c(4, 5, 9),
             cb = c(3, 5, 2)) 

Process the data

# load dplyr
library(dplyr)

# Sum across columns 'a' and 'ca' (sum(a, ca))
df2 <- df %>%
    select(contains('a'), -SampleID) %>% # 'select' function to choose the columns you want 
    mutate(row_sum = rowSums(.)) # 'mutate' function to create a new column 'row_sum' with the sum of the selected columns. You can drop the selected columns by using 'transmute' instead.

df2 # have a look

    a ca row_sum
1 0.1  7     7.1
2 0.2  4     4.2
3 0.5  1     1.5

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.