2

EDIT: I have edited my question slightly because the suggested solution was a bit problematic for my dataset. The OP is written below.

I have a dataset df of which prop is the amount of observations in that year as a fraction of total observations. For example: For the Netherlands (NLD) 60% of observations have the year 2005. For Bulgaria (BLG) this is 50%.

    row country year prop
1:   1     NLD 2005  0.6
2:   2     NLD 2005  0.6
3:   3     BLG 2006  0.5
4:   4     BLG 2005  0.5
5:   5     GER 2005  1.0
6:   6     NLD 2007  0.2
7:   7     NLD 2005  0.6
8:   8     NLD 2008  0.2

What I want is to get the following:

    row country prop2005 prop2006 prop2007 prop 2008 
1:   1     NLD  0.6      0.0      0.2      0.2
2:   2     NLD  0.6      0.0      0.2      0.2
3:   3     NLD  0.6      0.0      0.2      0.2
4:   4     BLG  0.5      0.5      0.0      0.0
5:   5     BLG  0.5      0.5      0.0      0.0
6:   6     BLG  0.5      0.5      0.0      0.0
7:   7     GER  1.0      0.0      0.0      0.0
8:   8     GER  1.0      0.0      0.0      0.0
9:   9     GER  1.0      0.0      0.0      0.0

ORIGINAL POST:

I have a dataset df of which prop is the amount of observations in that year as a fraction of total observations. For example: For the Netherlands (NLD) 60% of observations have the year 2005. For Bulgaria (BLG) this is 50%.

    row country year prop
1:   1     NLD 2005  0.6
2:   2     NLD 2005  0.6
3:   3     BLG 2006  0.5
4:   4     BLG 2005  0.5
5:   5     GER 2005  1.0
6:   6     NLD 2007  0.2
7:   7     NLD 2005  0.6
8:   8     NLD 2008  0.2

I would like to connect these values to a different dataset (df2 which has questions related to those years) and looks as follows:

    row country q05 q06 q07 q08 
1:   1     NLD  1   2   1   3   
2:   2     NLD  2   1   2   3   
3:   3     NLD  1   2   2   4   
4:   4     BLG  5   5   2   4   
5:   5     BLG  1   2   1   1   
6:   6     BLG  2   2   5   1   
7:   7     GER  3   5   4   4   
8:   8     GER  2   5   3   4   
9:   9     GER  1   2   3   5  

What I want is to get the following:

    row country prop2005 prop2006 prop2007 prop 2008 
1:   1     NLD  1   2   1   3   0.6      0.0      0.2      0.2
2:   2     NLD  2   1   2   3   0.6      0.0      0.2      0.2
3:   3     NLD  1   2   2   4   0.6      0.0      0.2      0.2
4:   4     BLG  5   5   2   4   0.5      0.5      0.0      0.0
5:   5     BLG  1   2   1   1   0.5      0.5      0.0      0.0
6:   6     BLG  2   2   5   1   0.5      0.5      0.0      0.0
7:   7     GER  3   5   4   4   1.0      0.0      0.0      0.0
8:   8     GER  2   5   3   4   1.0      0.0      0.0      0.0
9:   9     GER  1   2   3   5   1.0      0.0      0.0      0.0

In other words, for every observation, I want the proportions connected to that country added to the observation (as they function like a weight).

I am reasonably familiar with merging in data.table;

df1 <- merge(df1, df2,  by= "country", all.x = TRUE, allow.cartesian=FALSE)

However, I don't really know how I can reshape the data.table to correctly merge it.

Any suggestions?

CURRENT "SOLUTION":

df1 <- dcast(df1, country~year, value="prop")
df1 <- merge(df1, df2,  by= "country", all.x = TRUE, allow.cartesian=FALSE)
1
  • Hey Henrik, they correspond to individual observations which merely have the values shown in common. The actual data is much larger, so they are not actually doubles.. Commented Sep 25, 2018 at 6:24

2 Answers 2

4

A possible solution:

melt(df2, id = 1:2, value.name = 'q'
     )[, year := as.integer(paste0('20',sub('\\D+','',variable)))
       ][df, on = .(country, year), prop := i.prop
         ][is.na(prop), prop := 0
           ][, dcast(.SD, row + country ~ year, value.var = c('q','prop'), sep = '')]

which gives:

   row country q2005 q2006 q2007 q2008 prop2005 prop2006 prop2007 prop2008
1:   1     NLD     1     2     1     3      0.6      0.0      0.2      0.2
2:   2     NLD     2     1     2     3      0.6      0.0      0.2      0.2
3:   3     NLD     1     2     2     4      0.6      0.0      0.2      0.2
4:   4     BLG     5     5     2     4      0.5      0.5      0.0      0.0
5:   5     BLG     1     2     1     1      0.5      0.5      0.0      0.0
6:   6     BLG     2     2     5     1      0.5      0.5      0.0      0.0
7:   7     GER     3     5     4     4      1.0      0.0      0.0      0.0
8:   8     GER     2     5     3     4      1.0      0.0      0.0      0.0
9:   9     GER     1     2     3     5      1.0      0.0      0.0      0.0

To see how this works, you can split the code in several steps as follows:

df3 <- melt(df2, id = 1:2, value.name = 'q')[, year := as.integer(paste0('20',sub('\\D+','',variable)))]

df3[df, on = .(country, year), prop := i.prop][]
df3[is.na(prop), prop := 0][]
df3[, dcast(.SD, row + country ~ year, value.var = c('q','prop'), sep = '')]
Sign up to request clarification or add additional context in comments.

4 Comments

Hey Jaap, thank you so much for your answer. Could you help me out a little bit with the inner workings of your answer? I have to rewrite it in order to apply it to quite a big data base, but I am having a bit of trouble figuring out what does what exactly..
@TomKisters I've split up the code in several steps so that you can see what the different steps do. I will try to add some explanatory text later today (have to run now for a series of meetings)
Thank you for taking the time Jaap. I really appreciate it. I've been looking at your solution a bit in the meantime and I was wondering if it would not be easier (well, for me at least) to first reshape the data in the first df and then to merge it by country?
I have edited the original post to include my previous comment.
0

An R base solution is:

Sample data:

df<-read.table(header= T, text = "
row country year prop
1     NLD 2005  0.6
2     NLD 2005  0.6
3     BLG 2006  0.5
4     BLG 2005  0.5
5     GER 2005  1.0
6     NLD 2007  0.2
7     NLD 2005  0.6
8     NLD 2008  0.2
") 


df$row<-NULL
df2 <- reshape(df, direction = "wide", idvar = "country", timevar = "year")
df2[is.na(df2)] <- 0
df2[rep(1:nrow(df2),each=3),]

Outputs

  country prop.2005 prop.2006 prop.2007 prop.2008
1     NLD       0.6        NA       0.2       0.2
3     BLG       0.5       0.5        NA        NA
5     GER       1.0        NA        NA        NA

    country prop.2005 prop.2006 prop.2007 prop.2008
1       NLD       0.6       0.0       0.2       0.2
1.1     NLD       0.6       0.0       0.2       0.2
1.2     NLD       0.6       0.0       0.2       0.2
3       BLG       0.5       0.5       0.0       0.0
3.1     BLG       0.5       0.5       0.0       0.0
3.2     BLG       0.5       0.5       0.0       0.0
5       GER       1.0       0.0       0.0       0.0
5.1     GER       1.0       0.0       0.0       0.0
5.2     GER       1.0       0.0       0.0       0.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.