0

I have a data frame that looks like below:

    
ST xa_2009 xa_2010 xa_2011 xp_2009 xp_2010 xp_2011 ya_2009 ya_2010 ya_2011 yp_2009 yp_2010 za_2009
MI    12     13       19     30      19     30       11     14       11     14       23     25
AZ    19     30       11     14      23     25       12     13       19     30       19     30
NY    11     14       19     30      19     30       11     14       23     25       12     13      


The actual data has 700 rows and 250 columns in this same pattern.

I want to sum columns based on year (2009, 2010, 2011) and type (here the types are "a" and "p").

For example: xa_2009 + ya_2009 + za_2009, xa_2011+ ya_2011, xp_2009 + yp_2009 and so on..

so for this example, the final data frame should look like:

    
ST   a2009  a2010   a2011   p2009   p2010  p2011   
MI    48     27       30     44      42     30       
AZ    61     43       30     44      42     25       
NY    35     28       42     55      31     30            
    

I am using Rstudio. So, I prefer the code to be written in R. Thank you in advance!

1
  • why not create a subdataset and then use apply? you can create a vector containing all the names you need with a simple for loop. if you need more specific answers please use dput() to upload a reproducible example Commented Nov 11, 2020 at 8:05

3 Answers 3

2
df=read.table(text="
ST xa_2009 xa_2010 xa_2011 xp_2009 xp_2010 xp_2011 ya_2009 ya_2010 ya_2011 yp_2009 yp_2010 za_2009
MI    12     13       19     30      19     30       11     14       11     14       23     25
AZ    19     30       11     14      23     25       12     13       19     30       19     30
NY    11     14       19     30      19     30       11     14       23     25       12     13     ",h=T) 

abc=unique(substring(colnames(df)[-1],2))

res=sapply(1:length(abc),function(x){
  rowSums(df[,grep(abc[x],colnames(df)),drop=F])
})

colnames(res)=gsub("_","",abc)

res

     a2009 a2010 a2011 p2009 p2010 p2011
[1,]    48    27    30    44    42    30
[2,]    61    43    30    44    42    25
[3,]    35    28    42    55    31    30
Sign up to request clarification or add additional context in comments.

Comments

1

Does this work:

library(tidyr)
library(dplyr)
df %>% pivot_longer(!ST, names_to = c('x','.value'), names_pattern = '(.)(._\\d{4})') %>%
group_by(ST) %>% summarise(across(a_2009:p_2011, sum, na.rm = T))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 7
  ST    a_2009 a_2010 a_2011 p_2009 p_2010 p_2011
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 AZ        61     43     30     44     42     25
2 MI        48     27     30     44     42     30
3 NY        35     28     42     55     31     30
> 

Comments

0

another Base R approach is:

cbind(df[1],sapply(split.default(df[-1], sub(".",'',names(df)[-1])), rowSums))

  ST a_2009 a_2010 a_2011 p_2009 p_2010 p_2011
1 MI     48     27     30     44     42     30
2 AZ     61     43     30     44     42     25
3 NY     35     28     42     55     31     30

Another approach:

xtabs(values~., transform(cbind(df[1],stack(df[-1])), ind = sub('.','',ind)))
    ind
ST   a_2009 a_2010 a_2011 p_2009 p_2010 p_2011
  AZ     61     43     30     44     42     25
  MI     48     27     30     44     42     30
  NY     35     28     42     55     31     30

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.