36

I have the following data.table

x = structure(list(f1 = 1:3, f2 = 3:5), .Names = c("f1", "f2"), row.names = c(NA, -3L), class = c("data.table", "data.frame"))

I would like to apply a function to each row of the data.table. The function func.test uses args f1 and f2 and does something with it and returns a computed value. Assume (as an example)

func.text <- function(arg1,arg2){ return(arg1 + exp(arg2))}

but my real function is more complex and does loops and all, but returns a computed value. What would be the best way to accomplish this?

4 Answers 4

55

The best way is to write a vectorized function, but if you can't, then perhaps this will do:

x[, func.text(f1, f2), by = seq_len(nrow(x))]
Sign up to request clarification or add additional context in comments.

11 Comments

Ah, didn't think of using <code>by = 1:nrow(x)</code> trick. Nice one
Not sure why not just use .I, e.g., something like x[, func.text(f1, f2), by = .I]
@DavidArenburg I have no idea what by=.I is doing. It's somehow not quite the same as by=1:nrow(x), as you can check by comparing e.g. x[, 1, by = .I] and x[, 1, by = 1:nrow(x)].
would be great though if that worked as you'd expect it to work (also by=1:.N)
Yeah you probably right, but in this case it doesn't even look like the OP needs a by statement here, as his function already operates over the whole data set by row, so even x[, func.text(f1, f2)] will give the desired result. The problem will be that it will lose the data.table class and become a numeric vector. Adding by = .I will keep the class, but I'm not sure why or how (I'll probably will get some angry comment from @Arun pointing out my lack of understanding in data.table soon)
|
31

The most elegant way I've found is with mapply:

x[, value := mapply(func.text, f1, f2)]
x
#    f1 f2    value
# 1:  1  3 21.08554
# 2:  2  4 56.59815
# 3:  3  5 151.4132

Or with the purrr package:

x[, value := purrr::pmap_dbl(.(f1, f2), func.text)]

If your situation allows for it, another approach would be to match the arguments names to the column names to use:

library("purrr")

# arguments match the names of the columns, dots collect other 
# columns existing in the data.table
func.text <- function(f1, f2, ...) { return(f1 + exp(f2)) }

# use `set` to modify the data.table by reference
purrr::pmap_dbl(x, func.text) %>%
  data.table::set(x, i = NULL, j = "value", value = .)

print(x)
##    f1 f2     value
## 1:  1  3  21.08554
## 2:  2  4  56.59815
## 3:  3  5 151.41316

Comments

9

We can define rows with .I function.

dt_iris <- data.table(iris)
dt_iris[, ..I := .I]

## Let's define some function
some_fun <- function(dtX) {
    print('hello')
    return(dtX[, Sepal.Length / Sepal.Width])
}

## by row
dt_iris[, some_fun(.SD), by = ..I] # or simply: dt_iris[, some_fun(.SD), by = .I]

## vectorized calculation
some_fun(dt_iris) 

8 Comments

I am under the impression there was an age it was possible to directly use by=.I in the third component. No ?
@StéphaneLaurent sure, it is just to indicate that user sees the data, he applies by on. I have updated post to remove any doubt ;)
Sorry CronAcronis, maybe my comment is not clear. I mean it was possible to direclty do dt[, y:=somefun(x), by=I] in the past. But it is no possible now. Or maybe my memory is wrong.
@StéphaneLaurent I think you meant .I, so you can do dt_iris[, some_fun(.SD), by = .I], with dot.
Note that .I is meant to be used as a j argument in data.table, and not in the by clause. In DT >1.12.4 it doesn't seem to work either. @CronMerdek, can you re-evaluate your answer?
|
0

This is a pretty compact syntax

x[, c := .(Map(func.text, f1, f2))]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.