4

I want to recode the following values < 4 = -1, 4 = 0, > 4 = 1 for the following variables defined by core.vars in the dataset, and still keep the rest of the variables in the data frame.

temp.df <- as.tibble (mtcars)
other.vars <- c('hp', 'drat', 'wt')
core.vars <- c('mpg', 'cyl', 'disp')
temp.df <- rownames_to_column (temp.df, var ="cars_id")
temp.df <- temp.df %>% mutate_if (is.integer, as.numeric)

I have tried a number of ways to implement this. Using case_when, mutate, recode but with no luck. recode requires a vector and so my thought was to create a vector using case_when or mutate for each variable of interest and then recoding the values. But they have failed.

temp.df <- temp.df %>% 
           mutate_at(.vars %in% (core.vars)), '< 4' = "-1", '4' = "0", '> 4' = "1")

Error: unexpected ',' in "temp.df <- temp.df %>% mutate_at(.vars %in% (core.vars)),"

temp.df <- temp.df %>% 
           mutate_at(vars(one_of(core.vars)), '< 4' = "-1", '4' = "0", '> 4' = "1")

Error in inherits(x, "fun_list") : argument ".funs" is missing, with no default

 temp.df <- temp.df %>% 
            mutate (temp.df, case_when (vars(one_of(core.vars)), recode ('< 4' = "-1", '4' = "0", '> 4' = "1")))

Error in mutate_impl(.data, dots) : Column temp.df is of unsupported class data.frame

 temp.df <- temp.df %>% 
            case_when (vars(one_of(core.vars)), recode ('< 4' = "-1", '4' = "0", '> 4' = "1"))

Error in recode.character(< 4 = "-1", 4 = "0", > 4 = "1") : argument ".x" is missing, with no default

temp.df <- temp.df %>% rowwise() %>% mutate_at(vars (core.vars),
                                            funs (case_when (
                                                recode(., '< 4' = -1, '0' = 0, '>4' = 1)
                                            ))) %>%
 ungroup()`

Error in mutate_impl(.data, dots) : Evaluation error: Case 1 (recode(mpg,< 4= -1,0= 0,>4= 1)) must be a two-sided formula, not a double. In addition: Warning message: In recode.numeric(mpg, < 4 = -1, 0 = 0, >4 = 1) : NAs introduced by coercion

Previous questions on the forum include how to do this for individual variables, however as mentioned I have 100 variables and 300 samples so inputting them individually line by line is not an option.

Ideally, it would be nice to not create a separate data frame and then do join, or to create multiple separate variables as mutate would do.

I am sure there is a a for loop and/or ifelse method for this, but was trying to use tidyverse to achieve the goals. Any suggestions would be helpful.

1 Answer 1

4
temp.df %>%
  mutate_at(vars(one_of(core.vars)), 
            function(x) case_when(
              x < 4 ~ -1,
              x == 4 ~ 0,
              x > 4 ~ 1
            ))

Output

# A tibble: 32 x 12
   cars_id             mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <chr>             <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 Mazda RX4             1     1     1   110  3.9   2.62  16.5     0     1     4     4
 2 Mazda RX4 Wag         1     1     1   110  3.9   2.88  17.0     0     1     4     4
 3 Datsun 710            1     0     1    93  3.85  2.32  18.6     1     1     4     1
 4 Hornet 4 Drive        1     1     1   110  3.08  3.22  19.4     1     0     3     1
 5 Hornet Sportabout     1     1     1   175  3.15  3.44  17.0     0     0     3     2
 6 Valiant               1     1     1   105  2.76  3.46  20.2     1     0     3     1
 7 Duster 360            1     1     1   245  3.21  3.57  15.8     0     0     3     4
 8 Merc 240D             1     0     1    62  3.69  3.19  20       1     0     4     2
 9 Merc 230              1     0     1    95  3.92  3.15  22.9     1     0     4     2
10 Merc 280              1     1     1   123  3.92  3.44  18.3     1     0     4     4
Sign up to request clarification or add additional context in comments.

4 Comments

Amazing dude. Thanks. Exactly what what I was looking for.
Any suggestions on how to go about adding a function such as this one to that dataset: The error for numeric data is for cars_id column but I would like to keep it there. dichotomize.dataset <- function(x) { return( as.numeric( x > median(x, na.rm = TRUE) ) ); } temp1.df <- temp.df %>% mutate_at(vars(one_of(other.vars)), dichotomize.dataset()) Error in median.default(x, na.rm = TRUE) : need numeric data In addition: Warning message: Error in median.default(x, na.rm = TRUE) : need numeric data
Try without the () after the function name in mutate, you don't want to execute the function, you are just telling it the function you want to execute on each of your columns
Great tip. Appreciate your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.