Converting factor variable to numeric, and from numeric back to factor

Question

I have a massive dataset (9.000.000 entries) with two columns which are factors (409 levels). This represents flights between airports on a certain period. The dataset below is already after conversion. Meaning that "ORIGIN" and "DEST" are on its numeric form.

  ORIGIN DEST weight        alpha
      1   24   1195 1.512274e-04
      1   78    844 2.557285e-03
    100    2   1615 3.176266e-17
    100    3   4196 9.111249e-09
    100    7   1221 6.471515e-10
    100   12    725 2.129114e-04

A second dataset, has all the IATA codes, with the latitude and longitude.

           City IATA  Latitude Longitude
         Goroka  GKA -6.081690   145.392
         Madang  MAG -5.207080   145.789
    Mount Hagen  HGU -5.826790   144.296
         Nadzab  LAE -6.569803   146.726
   Port Moresby  POM -9.443380   147.220
          Wewak  WWK -3.583830   143.669

The current flow is the following:

Convert the 2 columns into numeric (as I need them later like that)
Convert the data.set into igraph
Apply the filtering algorithm (that's why the columns are numeric)
Convert again to a dataset.

My problem is that I wanted now to convert the numbers I have, back to the factors from before as I'll need latitude and longitude from the second dataset.

Any ideas? I've tried pretty much everything I can think of.

as.numeric(as.character(factor(c(1,100,23,47)))). as just doing factor will give it numeric levels. so convert to character and then to numeric, so in your case so as.numeric(as.character(df$ORIGIN)), where df is your data.frame — infominer
– infominer, Commented Feb 15, 2017 at 21:26

SolingerStuebchen · Accepted Answer · 2020-03-12 08:49:01Z

2

I would store your factor levels before converting it as.numeric, and then reapply them when restoring the factor class.
An example to clear what I'm saying:

data(iris)
# Store the levels
l<-levels(iris$Species)

# Convert to numeric
iris$Species <- as.numeric(iris$Species)
head(iris$Species)
class(iris$Species)

# Convert back to factor
iris$Species <- factor(iris$Species, labels = l)
head(iris$Species)
class(iris$Species)

edited Mar 12, 2020 at 8:49

SolingerStuebchen

4181 gold badge5 silver badges13 bronze badges

answered Feb 15, 2017 at 21:33

GGamba

13.7k3 gold badges41 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

effel · Accepted Answer · 2017-02-15 21:35:58Z

0

Before coercing the factors to numeric, create a lookup table of numeric-factor label pairs. At the end of your workflow, merge the factor labels back into your data.

library(dplyr)
data(warpbreaks)
original <- warpbreaks

value_label_map <- warpbreaks %>%
  select(wool, tension) %>%
  mutate(wool_num = as.numeric(wool), tension_num = as.numeric(tension)) %>%
  distinct()

warpbreaks <- warpbreaks %>%
  mutate(wool = as.numeric(wool), tension = as.numeric(tension))

warpbreaks <- left_join(warpbreaks, value_label_map,
  by = c("wool" = "wool_num", "tension" = "tension_num"))

identical(original$wool, warpbreaks$wool.y)
identical(original$tension, warpbreaks$tension.y)

answered Feb 15, 2017 at 21:35

effel

1,4211 gold badge9 silver badges17 bronze badges

2 Comments

FilipeTeixeira Over a year ago

thank you. Indeed this solved my issue. The problem was that I was trying to find a way of matching the two data.sets being that in the end (due to the filtering algorithm), I always end up with less columns. But your way solved it perfectly :). Thank you a lot really :D. This saved me from a massive headache.

effel Over a year ago

Glad to hear it! Cheers.

Collectives™ on Stack Overflow

Converting factor variable to numeric, and from numeric back to factor

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related