8

I am working with a data.table that has 1900 columns and roughly 280,000 rows.

Currently, the data is entirely "integer", but I want them to explicitly "numeric" so I can pass it to a bigcor() function later. Apparently, bigcor() can only handle "numeric" and not "integer".

I have tried:

full.bind <- full.bind[,sapply(full.bind, as.numeric), with=FALSE]

Unfortunately, I get the error:

Error in `[.data.table`(full.bind, , sapply(full.bind, as.numeric), with = FALSE) : 
  j out of bounds

So, I tried using the data.table set() function, but I get the error:

Error in set(full.bind, value = as.numeric(full.bind)) : 
  (list) object cannot be coerced to type 'double'

I have created a simple reproducible example. Keep in mind, the actual columns are NOT "a", "b", or "c"; they are extremely complicated column names so referencing column individually is not a possibility.

dt <- data.table(a=1:10, b=1:10, c=1:10)

So, my final questions are:

1) Why does my sapply technique not work? (what is the "j out of bounds" error?) 2) Why does the set() technique not? (why can't the data.table be coerced to numeric?) 3) Does the bigcor() function require a numeric object, or is there another problem?

3
  • 9
    One does not delete a question after receiving an answer. You got help for free so try to be grateful instead. Commented Apr 22, 2015 at 7:37
  • 1
    I actually tried to delete this immediately upon posting as I had found the answer somewhere else. Sorry about that! Commented Apr 23, 2015 at 8:23
  • 2
    not sure about differences between data.frame and data.table (so maybe this is irrelevant, sorry!), but I found dplyr to be helpful here: mutate_if(df, is.integer, as.numeric) converted all integer columns to numeric: clean, concise, quick. Commented Jun 30, 2017 at 17:14

1 Answer 1

23

Use .SD and assignment by reference:

library(data.table)
dt <- data.table(a=1:10, b=1:10, c=1:10)
sapply(dt, class)
#        a         b         c 
#"integer" "integer" "integer"

dt[, names(dt) := lapply(.SD, as.numeric)]
sapply(dt, class)
#        a         b         c 
#"numeric" "numeric" "numeric"

set only works for one column here (note the documentation, which doesn't say that j is optional), because each replacement column has to be generated. You would need to loop over the columns (e.g., using a for loop) if you want to use it. It might be preferable because it needs less memory (additional memory need corresponds to one column whereas additional memory for the whole data.table is needed with the first approach).

for (k in seq_along(dt)) set(dt, j = k, value = as.character(dt[[k]]))
sapply(dt, class)
#         a           b           c 
#"character" "character" "character"

However, bigcor (from package propagate) requires a matrix as input and a data.table isn't a matrix. So, your problem is not the column type, but you need to use as.matrix(dt).

Sign up to request clarification or add additional context in comments.

4 Comments

set() can work with more than one column simultaneously, but value has to be generated anyways for each column.. but prefer your set() approach because it requires memory worth one extra double column.. The := with lapply() would have to convert every column to numeric first before to replace (which requires the size of entire data in double).
@Arun Thanks. I tried to add this to the answer. However, I prefer the first approach for its nicer syntax. I rarely am in a situation where I wouldn't have enough memory available for it.
How do you do it if you only want to change some columns? For example I want to exclude the 3rd: dt[, names(.SD) := lapply(.SD, as.numeric), .SDcols=!c(3)] I get this error: Error in [.data.table(dt, , :=(names(.SD), lapply(.SD, as.integer)), : LHS of := isn't column names ('character') or positions ('integer' or 'numeric') And dt[, names(dt) := lapply(.SD, as.numeric), .SDcols=!c(3)] gives another error: In [.data.table(dt, , :=(names(dt), lapply(.SD, as.integer)), : Supplied 3 columns to be assigned a list (length 2) of values (recycled leaving remainder of 1 items).
dt[, names(dt[-3]) := lapply(.SD, as.numeric), .SDcols=!c(3)] seems too long.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.