sum of multiple columns based on column name

Question

I have a data frame that looks like below:

    
ST xa_2009 xa_2010 xa_2011 xp_2009 xp_2010 xp_2011 ya_2009 ya_2010 ya_2011 yp_2009 yp_2010 za_2009
MI    12     13       19     30      19     30       11     14       11     14       23     25
AZ    19     30       11     14      23     25       12     13       19     30       19     30
NY    11     14       19     30      19     30       11     14       23     25       12     13

The actual data has 700 rows and 250 columns in this same pattern.

I want to sum columns based on year (2009, 2010, 2011) and type (here the types are "a" and "p").

For example: xa_2009 + ya_2009 + za_2009, xa_2011+ ya_2011, xp_2009 + yp_2009 and so on..

so for this example, the final data frame should look like:

    
ST   a2009  a2010   a2011   p2009   p2010  p2011   
MI    48     27       30     44      42     30       
AZ    61     43       30     44      42     25       
NY    35     28       42     55      31     30

I am using Rstudio. So, I prefer the code to be written in R. Thank you in advance!

why not create a subdataset and then use apply? you can create a vector containing all the names you need with a simple for loop. if you need more specific answers please use dput() to upload a reproducible example — D.J
– D.J, Commented Nov 11, 2020 at 8:05

user2974951 · Accepted Answer · 2020-11-11 08:09:08Z

2

df=read.table(text="
ST xa_2009 xa_2010 xa_2011 xp_2009 xp_2010 xp_2011 ya_2009 ya_2010 ya_2011 yp_2009 yp_2010 za_2009
MI    12     13       19     30      19     30       11     14       11     14       23     25
AZ    19     30       11     14      23     25       12     13       19     30       19     30
NY    11     14       19     30      19     30       11     14       23     25       12     13     ",h=T) 

abc=unique(substring(colnames(df)[-1],2))

res=sapply(1:length(abc),function(x){
  rowSums(df[,grep(abc[x],colnames(df)),drop=F])
})

colnames(res)=gsub("_","",abc)

res

     a2009 a2010 a2011 p2009 p2010 p2011
[1,]    48    27    30    44    42    30
[2,]    61    43    30    44    42    25
[3,]    35    28    42    55    31    30

answered Nov 11, 2020 at 8:09

user2974951

10.4k2 gold badges21 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Karthik S · Accepted Answer · 2020-11-11 08:27:02Z

1

Does this work:

library(tidyr)
library(dplyr)
df %>% pivot_longer(!ST, names_to = c('x','.value'), names_pattern = '(.)(._\\d{4})') %>%
group_by(ST) %>% summarise(across(a_2009:p_2011, sum, na.rm = T))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 7
  ST    a_2009 a_2010 a_2011 p_2009 p_2010 p_2011
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 AZ        61     43     30     44     42     25
2 MI        48     27     30     44     42     30
3 NY        35     28     42     55     31     30
>

answered Nov 11, 2020 at 8:27

Karthik S

11.6k2 gold badges14 silver badges32 bronze badges

Comments

Onyambu · Accepted Answer · 2020-11-11 16:50:36Z

0

another Base R approach is:

cbind(df[1],sapply(split.default(df[-1], sub(".",'',names(df)[-1])), rowSums))

  ST a_2009 a_2010 a_2011 p_2009 p_2010 p_2011
1 MI     48     27     30     44     42     30
2 AZ     61     43     30     44     42     25
3 NY     35     28     42     55     31     30

Another approach:

xtabs(values~., transform(cbind(df[1],stack(df[-1])), ind = sub('.','',ind)))
    ind
ST   a_2009 a_2010 a_2011 p_2009 p_2010 p_2011
  AZ     61     43     30     44     42     25
  MI     48     27     30     44     42     30
  NY     35     28     42     55     31     30

answered Nov 11, 2020 at 16:50

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

Collectives™ on Stack Overflow

sum of multiple columns based on column name

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related