3

I have a massive dataset (9.000.000 entries) with two columns which are factors (409 levels). This represents flights between airports on a certain period. The dataset below is already after conversion. Meaning that "ORIGIN" and "DEST" are on its numeric form.

  ORIGIN DEST weight        alpha
      1   24   1195 1.512274e-04
      1   78    844 2.557285e-03
    100    2   1615 3.176266e-17
    100    3   4196 9.111249e-09
    100    7   1221 6.471515e-10
    100   12    725 2.129114e-04

A second dataset, has all the IATA codes, with the latitude and longitude.

           City IATA  Latitude Longitude
         Goroka  GKA -6.081690   145.392
         Madang  MAG -5.207080   145.789
    Mount Hagen  HGU -5.826790   144.296
         Nadzab  LAE -6.569803   146.726
   Port Moresby  POM -9.443380   147.220
          Wewak  WWK -3.583830   143.669

The current flow is the following:

  1. Convert the 2 columns into numeric (as I need them later like that)
  2. Convert the data.set into igraph
  3. Apply the filtering algorithm (that's why the columns are numeric)
  4. Convert again to a dataset.

My problem is that I wanted now to convert the numbers I have, back to the factors from before as I'll need latitude and longitude from the second dataset.

Any ideas? I've tried pretty much everything I can think of.

2
  • as.factor didn't work I take it? Commented Feb 15, 2017 at 21:26
  • 1
    as.numeric(as.character(factor(c(1,100,23,47)))). as just doing factor will give it numeric levels. so convert to character and then to numeric, so in your case so as.numeric(as.character(df$ORIGIN)), where df is your data.frame Commented Feb 15, 2017 at 21:26

2 Answers 2

2

I would store your factor levels before converting it as.numeric, and then reapply them when restoring the factor class.
An example to clear what I'm saying:

data(iris)
# Store the levels
l<-levels(iris$Species)

# Convert to numeric
iris$Species <- as.numeric(iris$Species)
head(iris$Species)
class(iris$Species)

# Convert back to factor
iris$Species <- factor(iris$Species, labels = l)
head(iris$Species)
class(iris$Species)
Sign up to request clarification or add additional context in comments.

Comments

0

Before coercing the factors to numeric, create a lookup table of numeric-factor label pairs. At the end of your workflow, merge the factor labels back into your data.

library(dplyr)
data(warpbreaks)
original <- warpbreaks

value_label_map <- warpbreaks %>%
  select(wool, tension) %>%
  mutate(wool_num = as.numeric(wool), tension_num = as.numeric(tension)) %>%
  distinct()

warpbreaks <- warpbreaks %>%
  mutate(wool = as.numeric(wool), tension = as.numeric(tension))

warpbreaks <- left_join(warpbreaks, value_label_map,
  by = c("wool" = "wool_num", "tension" = "tension_num"))

identical(original$wool, warpbreaks$wool.y)
identical(original$tension, warpbreaks$tension.y)

2 Comments

thank you. Indeed this solved my issue. The problem was that I was trying to find a way of matching the two data.sets being that in the end (due to the filtering algorithm), I always end up with less columns. But your way solved it perfectly :). Thank you a lot really :D. This saved me from a massive headache.
Glad to hear it! Cheers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.