1

I have a csv dataset which is similar to the following:

Age    Food_1_1 Food_1_2 Food_1_3  Amount_1_1 Amount_1_2 Amount_1_3
6        a        b          a      2          3           4
6        b        b          c      1          2           3
6                 c          a                 4           1
9        c        c          b      1          3           1
9        c        a          b      1          2           1

Using R, I want to get the following data set which contains new set of columns a, b and c by adding the corresponding values:

Age    Food_1_1 Food_1_2 Food_1_3  Amount_1_1 Amount_1_2 Amount_1_3   a   b   c    
6            a        b        a      2          3           4        6   3   0
6            b        b        c      1          2           3        0   3   3
6                     c        a                 4           1        1   0   4
9            c        c        b      1          3           1        0   1   4
9            c        a        b      1          2           1        2   1   1

Note: My data also contains missing values. The variables Food_1_1:Food_1_3 are factors and the variables Value1:Value3 are numeric. For more clearity: 1st row of column "a" contains the addition of all amount through Amount_1_1 to Amount_1_3 related to a (say for example 2+4 =6).

2
  • In your table, there are not variables "Monday:Wednesday". Which of the columns are factors then? Commented May 22, 2020 at 11:03
  • @Fabrizio. Thanks for catching that. That's a typo. Commented May 22, 2020 at 19:07

3 Answers 3

2

Stack Food and Amount to 2 columns and count observations by group.

library(tidyverse)

df %>%
  rowid_to_column("id") %>% 
  pivot_longer(-(1:2), names_to = c(".value", "set"), names_sep = "_1_", values_drop_na = T) %>% 
  group_by(id, Food) %>%
  tally(Amount) %>%
  ungroup() %>% 
  pivot_wider(names_from = Food, values_from = n, values_fill = list(n = 0))

More concisely, use the argument values_fn:

df %>%
  rowid_to_column("id") %>% 
  pivot_longer(-(1:2), names_to = c(".value", "set"), names_sep = "_1_", values_drop_na = T) %>% 
  pivot_wider(id_cols = id, names_from = Food, values_from = Amount,
              values_fill = list(Amount = 0), values_fn = list(Amount = sum))

# # A tibble: 5 x 4
#      id     a     b     c
#   <int> <int> <int> <int>
# 1     1     6     3     0
# 2     2     0     3     3
# 3     3     1     0     4
# 4     4     0     1     4
# 5     5     2     1     1

Data

df <- structure(list(Age = c(6L, 6L, 6L, 9L, 9L), Food_1_1 = structure(c(1L, 
2L, NA, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), 
Food_1_2 = structure(c(2L, 2L, 3L, 3L, 1L), .Label = c("a", 
"b", "c"), class = "factor"), Food_1_3 = structure(c(1L, 
3L, 1L, 2L, 2L), .Label = c("a", "b", "c"), class = "factor"), 
Amount_1_1 = c(2L, 1L, NA, 1L, 1L), Amount_1_2 = c(3L, 2L, 
4L, 3L, 2L), Amount_1_3 = c(4L, 3L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -5L))
Sign up to request clarification or add additional context in comments.

Comments

1

Using Darren Tsai's Example Data and in Base-R

we can get the values with

sapply(c("a","b","c"), function(x) apply(df,1, function(z) sum(as.numeric(z[grep(x,z)+3]))))

     a b c
[1,] 6 3 0
[2,] 0 3 3
[3,] 1 0 4
[4,] 0 1 4
[5,] 2 1 1

If you want to append them to your data frame then we can

cbind(df, The-above-code-goes-here )


  Age Food_1_1 Food_1_2 Food_1_3 Amount_1_1 Amount_1_2 Amount_1_3 a b c
1   6        a        b        a          2          3          4 6 3 0
2   6        b        b        c          1          2          3 0 3 3
3   6     <NA>        c        a         NA          4          1 1 0 4
4   9        c        c        b          1          3          1 0 1 4
5   9        c        a        b          1          2          1 2 1 1

Edit: A breakdown of the code.

sapply(c("a","b","c"), function(x) ... ) This sapply function takes the arguments from the list provided a, b, & c, and runs the function in ... one at a time but renames the arguments to x so that they can be referenced.

The next layer is apply(df,1, function(z) ... ) This apply function takes the argument df - your data frame, and runs the function inside of it ... row-by-row (specified with 1).

grep(x,z) is now finding which column the letter's x = a,b,c are found in that specific row of the dataframe df = z. Now that we've located the correct column(s) that contain the letters, the actual numbers are we want to sum are three columns to the the right, so we add grep(x,z)+3 to access those numbers.

lastly sum(as.numeric(z[grep(x,z)+3])) We are summing z[ 'columns' ] where if you recal z is the row of your dataframe in every iteration of the apply 'loop'.

2 Comments

@ Thank you Daniel. Your code worked for my given data. However, I have some questions. What do these function(x) and function (z) for? In apply function, should it be apply (df, 2) instead of 1 as I am summing the elements across the rows? Also, I was wondering what does grep(x,z)+3 means?
I've added some detailed explanation for you. If this helped please consider accepting my answer.
0

A possible solution would be the following: First, consider I put your data in a txt file formatted as follows:

Age,Food_1_1,Food_1_2,Food_1_3,Amount_1_1,Amount_1_2,Amount_1_3
6,a,b,a,2,3,4
6,b,b,c,1,2,3
6,,c,a,,4,1
9,c,c,b,1,3,1
9,c,a,b,1,2,1

The code to read the file and compute the sums is the following:

data=read.csv("test_table.txt", stringsAsFactors = FALSE)
lwords=c("a","b","c")
for(aw in lwords){
  data[,ncol(data)+1]=apply(data, 1, function(x, aw){
    return(sum(as.numeric(x[5:7][aw==x[2:4]])))
  }, aw=aw)
  colnames(data)[ncol(data)]=aw
}

The idea is very simple. In the for-loop you check if a word (imagine "a") appears in columns 2 to 4. This generates a list of true/false. You then use this list of true/false to sum the content of the columns 5 to 7.

The result:

> data
  Age Food_1_1 Food_1_2 Food_1_3 Amount_1_1 Amount_1_2 Amount_1_3 a b c
1   6        a        b        a          2          3          4 6 3 0
2   6        b        b        c          1          2          3 0 3 3
3   6                 c        a         NA          4          1 1 0 4
4   9        c        c        b          1          3          1 0 1 4
5   9        c        a        b          1          2          1 2 1 1

2 Comments

Thank you. However, I am getting an entire 3rd row with NAs for columns a, b ,c. Is there any way, I can fix it for the missing value in that row. I am assuming this is happening because of the missing Food_1_1 in that row.
I can't really help unless I see your data frame. Have you tried to format it as I wrote? As a comma-separated file?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.