4

Say I have the following dataframe (the real one is 10 labelx columns):

id <- c(1,2,3,4,5,6,7,8)
label1 <- c("apple","shoe","banana","hat","dog","radio","tree","pie")
label2 <- c("apple","sneaker","fruit","beanie","pet","ipod","doug fir","pie")
df <- data.frame(id,label1,label2)

And I would like to replace all items in the label columns with a word that categorizes it.

food <- c("apple","banana","pie","fruit")
clothing <- c("shoe","hat","beanie")
entertainment <- c("radio","ipod","mp3 player","phone")
forest <- c("tree","doug fir","redwood","forest")

I've tried something like the following:

column_list <- c("label1","label2")
new_df <- df

for(i in 1:2) {
  new_df <- new_df %>%
  mutate(parse(text=column_list[i-1]) = replace(parse(text=column_list[i-1]),
                      (parse(text=column_list[i-1]) %in% food),
                      "food"))
}

I don't have to do it this way, easier is fine too. Tidyverse preferred. How do I replace multiple values among multiple columns in R dataframe?

3 Answers 3

5

One possibility could be using mutate_at() and then a nested ifelse():

df %>%
 mutate_at(vars(contains("label")), 
           funs(ifelse(. %in% food, "food", 
                       ifelse(. %in% clothing, "clothing",
                              ifelse(. %in% entertainment, "entertainment",
                                     ifelse(. %in% forest, "forest", NA_character_))))))


  id        label1        label2
1  1          food          food
2  2      clothing          <NA>
3  3          food          food
4  4      clothing      clothing
5  5          <NA>          <NA>
6  6 entertainment entertainment
7  7        forest        forest
8  8          food          food

With mutate_at(), it selects the variables that has "label" in their name and then simply applies a nested ifelse() given the conditions.

Sign up to request clarification or add additional context in comments.

Comments

3

Here's an approach using base R. The idea is to create a named vector where the names are individual things (apple, shoe, etc.) and the values are the categories (food, clothing, etc.). Then it's a matter of extracting categories directly using the names.

obj = c("food", "clothing", "entertainment", "forest")
mylist = mget(obj)
mylist = lapply(obj, function(x){
    temp = mylist[[x]]
    setNames(rep(x, length(temp)), temp)
})
mylist = unlist(mylist)

df[-1] = lapply(df[-1], function(x) as.vector(mylist[as.character(x)]))
df
#  id        label1        label2
#1  1          food          food
#2  2      clothing          <NA>
#3  3          food          food
#4  4      clothing      clothing
#5  5          <NA>          <NA>
#6  6 entertainment entertainment
#7  7        forest        forest
#8  8          food          food

Comments

1

The tidyverse has evolved and this can be solved much more elegantly now.

library("tidyverse")

df <- data.frame(
  label1 = c("apple", "shoe", "banana", "hat", "dog", "radio", "tree", "pie"),
  label2 = c("apple", "sneaker", "fruit", "beanie", "pet", "ipod", "doug fir", "pie")
)

labels <- list(
  food = c("apple", "banana", "pie", "fruit"),
  clothing = c("shoe", "hat", "beanie"),
  entertainment = c("radio", "ipod", "mp3 player", "phone"),
  forest = c("tree", "doug fir", "redwood", "forest")
)

item_to_label <- labels %>%
  stack() %>%
  deframe()

df %>%
  mutate(
    across(
      c(label1, label2),
      ~ item_to_label[.]
    )
  )
#>          label1        label2
#> 1          food          food
#> 2      clothing          <NA>
#> 3          food          food
#> 4      clothing      clothing
#> 5          <NA>          <NA>
#> 6 entertainment entertainment
#> 7        forest        forest
#> 8          food          food

Created on 2022-03-16 by the reprex package (v2.0.1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.