Is there anyway to make this R code more efficient?

Question

I'm applying a fertility model and to execute I need to save matrices for each childbearing intensity, which I called mati, according to its order. In this case, i=(1, 2, 3,...n) The data frame bellow is an example of how my data is displayed. My real dataframe have 525 rows and 10 columns ("AGE" "year" "mat1" "mat2" "mat3" "mat4" "mat5" "mat6" "mat7" "mat8").

year <- c(rep(1998:2001, 4))
Age <- c(rep(15:18, 4))
mat1 <- c(rep(0.01, 16))
mat2 <- c(rep(0.012, 16))
mat3 <- c(rep(0.015, 16))
mat <- data.frame(year, Age, mat1, mat2, mat3)

mat

   year Age mat1  mat2  mat3
1  1998  15 0.01 0.012 0.015
2  1999  16 0.01 0.012 0.015
3  2000  17 0.01 0.012 0.015
4  2001  18 0.01 0.012 0.015
5  1998  15 0.01 0.012 0.015
6  1999  16 0.01 0.012 0.015
7  2000  17 0.01 0.012 0.015
8  2001  18 0.01 0.012 0.015
9  1998  15 0.01 0.012 0.015
10 1999  16 0.01 0.012 0.015
11 2000  17 0.01 0.012 0.015
12 2001  18 0.01 0.012 0.015
13 1998  15 0.01 0.012 0.015
14 1999  16 0.01 0.012 0.015
15 2000  17 0.01 0.012 0.015
16 2001  18 0.01 0.012 0.015

To execute get my final numeric matrices I already executed the code below, but it takes a long time.

##mat1###

library(dlyr)
library(tidyr)

mat1 <- #selecting just intensities of order 1 and creating matrices
  select(mat, Age, year, mat1) %>% 
  spread(year, mat1) 

names(mat1)[c(2:6)] <- paste0("year ", names(mat1[2:6])) #alter colnames
mat1[ ,1] <- paste0("age ", mat1[,1]) #alter the row from column "age"

mat_oe1 <- data.matrix(mat1[2:6])
dimnames(mat_oe1) <- list(c(mat1[,1]),
                          c(names(mat1[2:6])))
#Saving as txt to read i the model
write.table(mat_oe2, file = "mat_oe1.txt", sep = "\t",
            row.names = T, col.names = T)

##mat2
mat2 <- #selecting just intensities of order 1 and creating matrices
  select(mat, Age, year, mat2) %>% 
  spread(year, mat2) 

names(mat2)[c(2:6)] <- paste0("year ", names(mat2[2:6])) #alter colnames
mat2[ ,1] <- paste0("age ", mat2[,1]) #alter the row from column "age"

mat_oe2 <- data.matrix(mat2[2:6])
dimnames(mat_oe2) <- list(c(mat1[,1]),
                          c(names(mat1[2:6])))
#Saving as txt to read i the model
write.table(mat_oe2, file = "mat_oe2.txt", sep = "\t",
            row.names = T, col.names = T)

##mat3
mat3 <- #selecting just intensities of order 1 and creating matrices
  select(mat, Age, year, mat3) %>% 
  spread(year, mat3) 

names(mat3)[c(2:6)] <- paste0("year ", names(mat3[2:6])) #alter colnames
mat3[ ,1] <- paste0("age ", mat3[,1]) #alter the row from column "age"

mat_oe3 <- data.matrix(mat3[2:6])
dimnames(mat_oe3) <- list(c(mat3[,1]),
                          c(names(mat3[2:6])))
#Saving as txt to read i the model
write.table(mat_oe3, file = "mat_oe3.txt", sep = "\t",
            row.names = T, col.names = T)

I'm using spread because I need data with the format below:

mat1 

     1998        1999       2000       2001
15   0.01        0.01       0.01       0.01
16   0.01        0.01       0.01       0.01
17   0.01        0.01       0.01       0.01
18   0.01        0.01       0.01       0.01

I also started to write a loop, but I'm stuck in the first line already.

mat_list <- list()
for(i in names(mat[,3:7])) {
  mat_list[[i]] <- data.frame(
                      spread(
                        select(mat, AGE, year, mat[[paste0("mat",i)]]), year, mat[[paste0("mat", i)]]))

After applying the code above I've achieved the results below:

view(mat1)
        year 1998  year 1999  year 2000  year 2001
age 15   0.01        0.01       0.01       0.01
age 16   0.01        0.01       0.01       0.01
age 17   0.01        0.01       0.01       0.01
age 18   0.01        0.01       0.01       0.01


view(mat2)
        year 1998  year 1999    year 2000    year 2001
age 15   0.012        0.012       0.012       0.012
age 16   0.012        0.012       0.012       0.012
age 17   0.012        0.012       0.012       0.012
age 18   0.012        0.012       0.012       0.012


view(mat3)
        year 1998  year 1999    year 2000    year 2001
age 15   0.015        0.015       0.015       0.015
age 16   0.015        0.015       0.015       0.015
age 17   0.015        0.015       0.015       0.015
age 18   0.015        0.015       0.015       0.015

You could put mat1, mat2, mat3 in a list. ... but that is only my opinion. — jogo
– jogo, Commented Aug 3, 2019 at 20:12
I get an error when I copy and paste your code. Can you verify the mat1 <- #selectin just intensities... line? — Cole
– Cole, Commented Aug 3, 2019 at 21:49
@jogo I've tried to put in a list and apply a loop. But I couldn't. — Tereza Bernardes
– Tereza Bernardes, Commented Aug 3, 2019 at 22:06
@Cole there is a mistake in the spread function. I did earlier and know it isn't working anymore — Tereza Bernardes
– Tereza Bernardes, Commented Aug 3, 2019 at 22:08

rpolicastro · Accepted Answer · 2019-08-04 00:12:04Z

To expand on Cole's answer.

mat %>%
    gather("mat", "val", -year, -Age) %>%
    mutate(Age=paste("age",Age), year=paste("year",year)) %>%
    group_by(mat) %>%
    group_map(~spread(., year, val))

purrr::group_map applies a function to each group, and returns a list where each list element is the result of the function applied to each group.

# A tibble: 4 x 5
  Age    `year 1998` `year 1999` `year 2000` `year 2001`
  <chr>        <dbl>       <dbl>       <dbl>       <dbl>
1 age 15        0.01        0.01        0.01        0.01
2 age 16        0.01        0.01        0.01        0.01
3 age 17        0.01        0.01        0.01        0.01
4 age 18        0.01        0.01        0.01        0.01

[[2]]
# A tibble: 4 x 5
  Age    `year 1998` `year 1999` `year 2000` `year 2001`
  <chr>        <dbl>       <dbl>       <dbl>       <dbl>
1 age 15       0.012       0.012       0.012       0.012
2 age 16       0.012       0.012       0.012       0.012
3 age 17       0.012       0.012       0.012       0.012
4 age 18       0.012       0.012       0.012       0.012

[[3]]
# A tibble: 4 x 5
  Age    `year 1998` `year 1999` `year 2000` `year 2001`
  <chr>        <dbl>       <dbl>       <dbl>       <dbl>
1 age 15       0.015       0.015       0.015       0.015
2 age 16       0.015       0.015       0.015       0.015
3 age 17       0.015       0.015       0.015       0.015
4 age 18       0.015       0.015       0.015       0.015

This was using the data slightly modified by Cole.

year <- rep(1998:2001, each = 4) #each was the change.
Age <- rep(15:18, 4)
mat1 <- rep(0.01, 16)
mat2 <- rep(0.012, 16)
mat3 <- rep(0.015, 16)
mat <- data.frame(year, Age, mat1, mat2, mat3)

I've edited my answer. Also look into group_split() which is about twice as fast although admittedly, I didn't rename the columns to "year 1998".

Cole · Accepted Answer · 2019-08-04 01:09:04Z

1

I believe you want to gather and then spread the data. This allows you to do everything in two steps.

library(dplyr)
library(tidyr)

mat %>%
  gather(key, value, -year, -Age)%>%
  spread(year, value)%>%
  group_split(key)

[[1]]
# A tibble: 4 x 6
    Age key   `1998` `1999` `2000` `2001`
  <int> <chr>        <dbl>        <dbl>        <dbl>        <dbl>
1    15 mat1          0.01         0.01         0.01         0.01
2    16 mat1          0.01         0.01         0.01         0.01
3    17 mat1          0.01         0.01         0.01         0.01
4    18 mat1          0.01         0.01         0.01         0.01

[[2]]
# A tibble: 4 x 6
    Age key   `1998` `1999` `2000` `2001`
  <int> <chr>        <dbl>        <dbl>        <dbl>        <dbl>
1    15 mat2         0.012        0.012        0.012        0.012
2    16 mat2         0.012        0.012        0.012        0.012
3    17 mat2         0.012        0.012        0.012        0.012
4    18 mat2         0.012        0.012        0.012        0.012

[[3]]
# A tibble: 4 x 6
    Age key   `1998` `1999` `2000` `2001`
  <int> <chr>        <dbl>        <dbl>        <dbl>        <dbl>
1    15 mat3         0.015        0.015        0.015        0.015
2    16 mat3         0.015        0.015        0.015        0.015
3    17 mat3         0.015        0.015        0.015        0.015
4    18 mat3         0.015        0.015        0.015        0.015

Or you can do it in base:

mats <- reshape(data = data.frame(year = mat$year,Age = mat$Age,  stack(mat, select = c('mat1', 'mat2', 'mat3')))
        , idvar = c('Age', 'ind'), timevar = c('year'), direction = 'wide')

mat_list <- split(mats, mats$ind)

mat_list

$mat1
  Age  ind values.1998 values.1999 values.2000 values.2001
1  15 mat1        0.01        0.01        0.01        0.01
2  16 mat1        0.01        0.01        0.01        0.01
3  17 mat1        0.01        0.01        0.01        0.01
4  18 mat1        0.01        0.01        0.01        0.01

$mat2
   Age  ind values.1998 values.1999 values.2000 values.2001
17  15 mat2       0.012       0.012       0.012       0.012
18  16 mat2       0.012       0.012       0.012       0.012
19  17 mat2       0.012       0.012       0.012       0.012
20  18 mat2       0.012       0.012       0.012       0.012

$mat3
   Age  ind values.1998 values.1999 values.2000 values.2001
33  15 mat3       0.015       0.015       0.015       0.015
34  16 mat3       0.015       0.015       0.015       0.015
35  17 mat3       0.015       0.015       0.015       0.015
36  18 mat3       0.015       0.015       0.015       0.015

data I changed your data slightly so that each ID combination was unique.

year <- rep(1998:2001, each = 4) #each was the change.
Age <- rep(15:18, 4)
mat1 <- rep(0.01, 16)
mat2 <- rep(0.012, 16)
mat3 <- rep(0.015, 16)
mat <- data.frame(year, Age, mat1, mat2, mat3)

edited Aug 4, 2019 at 1:09

answered Aug 3, 2019 at 22:04

Cole

11.3k2 gold badges11 silver badges25 bronze badges

4 Comments

Tereza Bernardes Over a year ago

Thanks for your help, but I really need a separate matrix for each one of the mat variables. Unfortunately, they must have the format that I posted at the end of my question

Cole Over a year ago

Just subset it. mat1 <- result[result$ind == 'mat1'] or I'd say go with having them all together.

Tereza Bernardes Over a year ago

I also tried to melt and then use tapply. But I could'n achieve my objetive

Cole Over a year ago

Ok, I added an edit. It's now a list of tibbles which is preferable to having 10 separate variables in the global environment. Still, my gut is saying that there is no need to split the variables.

zhang jing · Accepted Answer · 2019-08-04 12:37:08Z

First reshape to long

#add unique id to your data
mat$id=1:nrow(mat)
#reshape to long by mat
long1 = reshape_toLong(data = mat,id = "id",j = "all123",value.var.prefix = "mat")
#delet id column
long2=long1[,-1]

Second reshape to wide

#reshape wide by year
wide=reshape_toWide(data = long2,id = "all123",j = "year",value.var.prefix = "mat")

Last get data

mat1

wide[wide$all123==1,]
   Age all123 mat1998 mat1999 mat2000 mat2001
1   15      1    0.01    0.01    0.01    0.01
4   16      1    0.01    0.01    0.01    0.01
8   17      1    0.01    0.01    0.01    0.01
12  18      1    0.01    0.01    0.01    0.01

mat2

wide[wide$all123==2,]
   Age all123 mat1998 mat1999 mat2000 mat2001
3   15      2   0.012   0.012   0.012   0.012
5   16      2   0.012   0.012   0.012   0.012
7   17      2   0.012   0.012   0.012   0.012
11  18      2   0.012   0.012   0.012   0.012

mat3

wide[wide$all123==3,]
   Age all123 mat1998 mat1999 mat2000 mat2001
2   15      3   0.015   0.015   0.015   0.015
6   16      3   0.015   0.015   0.015   0.015
9   17      3   0.015   0.015   0.015   0.015
10  18      3   0.015   0.015   0.015   0.015

before use reshape_toLong and reshape_toWide functions, you need to install onetree package from my github yikeshu0611 using command below

devtools::install_github("yikeshu0611/onetree")
library(onetree)

Attention: The data you provided has problem, so I use data changed by Cole

year <- rep(1998:2001, each = 4) #each was the change.
Age <- rep(15:18, 4)
mat1 <- rep(0.01, 16)
mat2 <- rep(0.012, 16)
mat3 <- rep(0.015, 16)
mat <- data.frame(year, Age, mat1, mat2, mat3)

Collectives™ on Stack Overflow

Is there anyway to make this R code more efficient?

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related