0

Can someone please explain how as.numeric(levels(x))[x] exactly work? here x is a factor variable.(for example x<-as.factor(sample(1:5,20,replace=TRUE)) ) As much as i am able to understand is that first we are getting the levels of x (which will be character after that we are changing it to numeric. what is happening after that I am not able to get. I know this representation is same as as.numeric(as.character(x)).

4
  • Have you read the first answer here? Commented Nov 13, 2018 at 18:57
  • ...then it's just using x values as positions to get the corresponding levels, in a numeric form. You can use as.numeric(levels(x))[c(1,1,2)] as an example, which means give me the 1st, 1st (again) and 2nd level. If you try to ask for something that doesn't exist it will return NA like this as.numeric(levels(x))[c(1,1,2,6)] Commented Nov 13, 2018 at 18:58
  • @DeNovo Yes I saw that post but I think It was regarding how to perform the conversion but not about how exactly it is happening. Commented Nov 13, 2018 at 19:38
  • @AntoniosK got it. Thank you. Commented Nov 13, 2018 at 20:08

2 Answers 2

2

R factors are vectors of integers that serve as indices into the levels character vector. So the inner part of that expression is creating a character vector. The outer part is converting the set of values: "5", "2", "4" .... etc into numeric values.

> x<-as.factor(sample(1:5,20,replace=TRUE)) 

The storage class of factor objects is integer:

> dput (x)
structure(c(4L, 2L, 3L, 4L, 5L, 2L, 2L, 2L, 1L, 2L, 4L, 2L, 1L, 
5L, 5L, 4L, 1L, 5L, 1L, 5L), .Label = c("1", "2", "3", "4", "5"
), class = "factor")

The levels() function returns the .Label attribute of a factor, and when a factor is used as an index, it gets handled as an integer:

> levels(x)[x]
 [1] "4" "2" "3" "4" "5" "2" "2" "2" "1" "2" "4" "2" "1" "5" "5" "4" "1" "5" "1" "5"

This method of conversion or extractions is slightly faster than as.character(x), but as you have experienced, it may seem a bit cryptic if you haven't worked through what is happening "under the hood" (or "bonnet" if that's what it's called in your part of the Englrish speaking world.)

Sign up to request clarification or add additional context in comments.

Comments

2

I always confused with R's factors. Usually, I use a perfect idea from package Rfast, the function Rfast::ufactor. It represents a factor using its initial type.

Here is an exmple:

x <- rnorm(10)
fx<- Rfast::ufactor(x)
fx$levels # you can get the levels like this
fx$values # you can get the values like this

Fast and simple. Rfast::ufactor is much faster than R's but I will not post any benchmark cause it doens't fit to the question.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.