0

so I have a dataframe of the following structure, let's call this one df0:

year category a b c d
1989 1 0.3 0.7 0.43 321
1989 1 0.3 0.7 0.43 321
1989 2 0.2 0.4 0.5 174
1989 2 0.2 0.4 0.5 174
1989 2 0.2 0.4 0.5 174
1989 3 0.6 0.2 3.0 224
1990 1 0.6 0.2 3.0 93
1990 1 0.6 0.2 3.0 93
1990 2 0.3 0.7 4.0 293
1990 3 0.9 0.6 2.0 13

What I need to turn this into is the following. Basically, I want to add for each year a column with the a c value for each category. Like this:

year category a b c d c1 c2 c3
1989 1 0.3 0.7 0.43 321 0.43 0.5 3.0
1989 1 0.3 0.7 0.43 321 0.43 0.5 3.0
1989 2 0.2 0.4 0.5 174 0.43 0.5 3.0
1989 2 0.2 0.4 0.5 174 0.43 0.5 3.0
1989 2 0.2 0.4 0.5 174 0.43 0.5 3.0
1989 3 0.6 0.2 3.0 224 0.43 0.5 3.0
1990 1 0.6 0.2 3.0 93 3.0 4.0 2.0
1990 1 0.6 0.2 3.0 93 3.0 4.0 2.0
1990 2 0.3 0.7 4.0 293 3.0 4.0 2.0
1990 3 0.9 0.6 2.0 13 3.0 4.0 2.0

I cannot figure out how to compute this. My first Idea would be to create sub dataframes for each year and then create a vector of each c value from this, but this seems very tedious and I cannot get it to work.

Does anyone have input or a solution on this?

KR

2 Answers 2

2

Using tidyr and dplyr, pivot then join:

library(tidyr)
library(dplyr)

cvals <- df0 %>%
  distinct(year, category, c) %>%
  pivot_wider(
    names_from = category,
    names_prefix = "c",
    values_from = c
  )

left_join(df0, cvals, join_by(year))

Result:

# A tibble: 10 × 9
    year category     a     b     c     d    c1    c2    c3
   <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  1989        1   0.3   0.7  0.43   321  0.43   0.5     3
 2  1989        1   0.3   0.7  0.43   321  0.43   0.5     3
 3  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
 4  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
 5  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
 6  1989        3   0.6   0.2  3      224  0.43   0.5     3
 7  1990        1   0.6   0.2  3       93  3      4       2
 8  1990        1   0.6   0.2  3       93  3      4       2
 9  1990        2   0.3   0.7  4      293  3      4       2
10  1990        3   0.9   0.6  2       13  3      4       2
Sign up to request clarification or add additional context in comments.

Comments

1

One way to do this would be to gather all unique values in each year and reshape the data in wide format using pivot_wider. We can join this reshaped data with original data by year to get back expected data frame as output.

library(dplyr)
library(tidyr)

df %>%
  reframe(unique_c = unique(c), .by = year) %>%
  mutate(row = row_number(), .by = year) %>%
  pivot_wider(names_from = row, names_prefix = "c", values_from = unique_c) %>%
  left_join(df, by = join_by(year)) %>%
  # Optional, only to get data in same order as expected output
  relocate(year, category, a:d)

#   year category     a     b     c     d    c1    c2    c3
#   <int>    <int> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1  1989        1   0.3   0.7  0.43   321  0.43   0.5     3
# 2  1989        1   0.3   0.7  0.43   321  0.43   0.5     3
# 3  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 4  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 5  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 6  1989        3   0.6   0.2  3      224  0.43   0.5     3
# 7  1990        1   0.6   0.2  3       93  3      4       2
# 8  1990        1   0.6   0.2  3       93  3      4       2
# 9  1990        2   0.3   0.7  4      293  3      4       2
#10  1990        3   0.9   0.6  2       13  3      4       2

data

df <- structure(list(year = c(1989L, 1989L, 1989L, 1989L, 1989L, 1989L, 
1990L, 1990L, 1990L, 1990L), category = c(1L, 1L, 2L, 2L, 2L, 
3L, 1L, 1L, 2L, 3L), a = c(0.3, 0.3, 0.2, 0.2, 0.2, 0.6, 0.6, 
0.6, 0.3, 0.9), b = c(0.7, 0.7, 0.4, 0.4, 0.4, 0.2, 0.2, 0.2, 
0.7, 0.6), c = c(0.43, 0.43, 0.5, 0.5, 0.5, 3, 3, 3, 4, 2), d = c(321L, 
321L, 174L, 174L, 174L, 224L, 93L, 93L, 293L, 13L)), row.names = c(NA, 
-10L), class = "data.frame")

2 Comments

This works perfectly apart from one issue: If there are jumps in the category value, say, there is a category 5, no 6, but a 7, then it only adds columns for 1 to including 6, not the rest, say if there are still categories 7-9. Any idea on how to fix this? Really appreciate your support!
In that case, solution from @zephryl would be more suitable I guess.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.