Reshaping in data.table

Question

EDIT: I have edited my question slightly because the suggested solution was a bit problematic for my dataset. The OP is written below.

I have a dataset df of which prop is the amount of observations in that year as a fraction of total observations. For example: For the Netherlands (NLD) 60% of observations have the year 2005. For Bulgaria (BLG) this is 50%.

    row country year prop
1:   1     NLD 2005  0.6
2:   2     NLD 2005  0.6
3:   3     BLG 2006  0.5
4:   4     BLG 2005  0.5
5:   5     GER 2005  1.0
6:   6     NLD 2007  0.2
7:   7     NLD 2005  0.6
8:   8     NLD 2008  0.2

What I want is to get the following:

    row country prop2005 prop2006 prop2007 prop 2008 
1:   1     NLD  0.6      0.0      0.2      0.2
2:   2     NLD  0.6      0.0      0.2      0.2
3:   3     NLD  0.6      0.0      0.2      0.2
4:   4     BLG  0.5      0.5      0.0      0.0
5:   5     BLG  0.5      0.5      0.0      0.0
6:   6     BLG  0.5      0.5      0.0      0.0
7:   7     GER  1.0      0.0      0.0      0.0
8:   8     GER  1.0      0.0      0.0      0.0
9:   9     GER  1.0      0.0      0.0      0.0

ORIGINAL POST:

I have a dataset df of which prop is the amount of observations in that year as a fraction of total observations. For example: For the Netherlands (NLD) 60% of observations have the year 2005. For Bulgaria (BLG) this is 50%.

    row country year prop
1:   1     NLD 2005  0.6
2:   2     NLD 2005  0.6
3:   3     BLG 2006  0.5
4:   4     BLG 2005  0.5
5:   5     GER 2005  1.0
6:   6     NLD 2007  0.2
7:   7     NLD 2005  0.6
8:   8     NLD 2008  0.2

I would like to connect these values to a different dataset (df2 which has questions related to those years) and looks as follows:

    row country q05 q06 q07 q08 
1:   1     NLD  1   2   1   3   
2:   2     NLD  2   1   2   3   
3:   3     NLD  1   2   2   4   
4:   4     BLG  5   5   2   4   
5:   5     BLG  1   2   1   1   
6:   6     BLG  2   2   5   1   
7:   7     GER  3   5   4   4   
8:   8     GER  2   5   3   4   
9:   9     GER  1   2   3   5

What I want is to get the following:

    row country prop2005 prop2006 prop2007 prop 2008 
1:   1     NLD  1   2   1   3   0.6      0.0      0.2      0.2
2:   2     NLD  2   1   2   3   0.6      0.0      0.2      0.2
3:   3     NLD  1   2   2   4   0.6      0.0      0.2      0.2
4:   4     BLG  5   5   2   4   0.5      0.5      0.0      0.0
5:   5     BLG  1   2   1   1   0.5      0.5      0.0      0.0
6:   6     BLG  2   2   5   1   0.5      0.5      0.0      0.0
7:   7     GER  3   5   4   4   1.0      0.0      0.0      0.0
8:   8     GER  2   5   3   4   1.0      0.0      0.0      0.0
9:   9     GER  1   2   3   5   1.0      0.0      0.0      0.0

In other words, for every observation, I want the proportions connected to that country added to the observation (as they function like a weight).

I am reasonably familiar with merging in data.table;

df1 <- merge(df1, df2,  by= "country", all.x = TRUE, allow.cartesian=FALSE)

However, I don't really know how I can reshape the data.table to correctly merge it.

Any suggestions?

CURRENT "SOLUTION":

df1 <- dcast(df1, country~year, value="prop")
df1 <- merge(df1, df2,  by= "country", all.x = TRUE, allow.cartesian=FALSE)

Hey Henrik, they correspond to individual observations which merely have the values shown in common. The actual data is much larger, so they are not actually doubles.. — Tom
– Tom, Commented Sep 25, 2018 at 6:24

Jaap · Accepted Answer · 2018-09-25 06:31:59Z

4

A possible solution:

melt(df2, id = 1:2, value.name = 'q'
     )[, year := as.integer(paste0('20',sub('\\D+','',variable)))
       ][df, on = .(country, year), prop := i.prop
         ][is.na(prop), prop := 0
           ][, dcast(.SD, row + country ~ year, value.var = c('q','prop'), sep = '')]

which gives:

   row country q2005 q2006 q2007 q2008 prop2005 prop2006 prop2007 prop2008
1:   1     NLD     1     2     1     3      0.6      0.0      0.2      0.2
2:   2     NLD     2     1     2     3      0.6      0.0      0.2      0.2
3:   3     NLD     1     2     2     4      0.6      0.0      0.2      0.2
4:   4     BLG     5     5     2     4      0.5      0.5      0.0      0.0
5:   5     BLG     1     2     1     1      0.5      0.5      0.0      0.0
6:   6     BLG     2     2     5     1      0.5      0.5      0.0      0.0
7:   7     GER     3     5     4     4      1.0      0.0      0.0      0.0
8:   8     GER     2     5     3     4      1.0      0.0      0.0      0.0
9:   9     GER     1     2     3     5      1.0      0.0      0.0      0.0

To see how this works, you can split the code in several steps as follows:

df3 <- melt(df2, id = 1:2, value.name = 'q')[, year := as.integer(paste0('20',sub('\\D+','',variable)))]

df3[df, on = .(country, year), prop := i.prop][]
df3[is.na(prop), prop := 0][]
df3[, dcast(.SD, row + country ~ year, value.var = c('q','prop'), sep = '')]

edited Sep 25, 2018 at 6:31

answered Sep 24, 2018 at 15:55

Jaap

83.7k36 gold badges190 silver badges203 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Tom Over a year ago

Hey Jaap, thank you so much for your answer. Could you help me out a little bit with the inner workings of your answer? I have to rewrite it in order to apply it to quite a big data base, but I am having a bit of trouble figuring out what does what exactly..

Jaap Over a year ago

@TomKisters I've split up the code in several steps so that you can see what the different steps do. I will try to add some explanatory text later today (have to run now for a series of meetings)

Tom Over a year ago

Thank you for taking the time Jaap. I really appreciate it. I've been looking at your solution a bit in the meantime and I was wondering if it would not be easier (well, for me at least) to first reshape the data in the first df and then to merge it by country?

Tom Over a year ago

I have edited the original post to include my previous comment.

Alan Gómez · Accepted Answer · 2023-07-12 14:25:28Z

An R base solution is:

Sample data:

df<-read.table(header= T, text = "
row country year prop
1     NLD 2005  0.6
2     NLD 2005  0.6
3     BLG 2006  0.5
4     BLG 2005  0.5
5     GER 2005  1.0
6     NLD 2007  0.2
7     NLD 2005  0.6
8     NLD 2008  0.2
") 


df$row<-NULL
df2 <- reshape(df, direction = "wide", idvar = "country", timevar = "year")
df2[is.na(df2)] <- 0
df2[rep(1:nrow(df2),each=3),]

Outputs

  country prop.2005 prop.2006 prop.2007 prop.2008
1     NLD       0.6        NA       0.2       0.2
3     BLG       0.5       0.5        NA        NA
5     GER       1.0        NA        NA        NA

    country prop.2005 prop.2006 prop.2007 prop.2008
1       NLD       0.6       0.0       0.2       0.2
1.1     NLD       0.6       0.0       0.2       0.2
1.2     NLD       0.6       0.0       0.2       0.2
3       BLG       0.5       0.5       0.0       0.0
3.1     BLG       0.5       0.5       0.0       0.0
3.2     BLG       0.5       0.5       0.0       0.0
5       GER       1.0       0.0       0.0       0.0
5.1     GER       1.0       0.0       0.0       0.0
5.2     GER       1.0       0.0       0.0       0.0

Collectives™ on Stack Overflow

Reshaping in data.table

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related