R data.table apply function to rows using columns as arguments

Question

I have the following data.table

x = structure(list(f1 = 1:3, f2 = 3:5), .Names = c("f1", "f2"), row.names = c(NA, -3L), class = c("data.table", "data.frame"))

I would like to apply a function to each row of the data.table. The function func.test uses args f1 and f2 and does something with it and returns a computed value. Assume (as an example)

func.text <- function(arg1,arg2){ return(arg1 + exp(arg2))}

but my real function is more complex and does loops and all, but returns a computed value. What would be the best way to accomplish this?

eddi · Accepted Answer · 2018-08-17 18:00:42Z

55

The best way is to write a vectorized function, but if you can't, then perhaps this will do:

x[, func.text(f1, f2), by = seq_len(nrow(x))]

edited Aug 17, 2018 at 18:00

answered Aug 21, 2014 at 17:03

eddi

49.5k6 gold badges109 silver badges157 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

broccoli Over a year ago

Ah, didn't think of using <code>by = 1:nrow(x)</code> trick. Nice one

David Arenburg Over a year ago

Not sure why not just use .I, e.g., something like x[, func.text(f1, f2), by = .I]

eddi Over a year ago

@DavidArenburg I have no idea what by=.I is doing. It's somehow not quite the same as by=1:nrow(x), as you can check by comparing e.g. x[, 1, by = .I] and x[, 1, by = 1:nrow(x)].

eddi Over a year ago

would be great though if that worked as you'd expect it to work (also by=1:.N)

David Arenburg Over a year ago

Yeah you probably right, but in this case it doesn't even look like the OP needs a by statement here, as his function already operates over the whole data set by row, so even x[, func.text(f1, f2)] will give the desired result. The problem will be that it will lose the data.table class and become a numeric vector. Adding by = .I will keep the class, but I'm not sure why or how (I'll probably will get some angry comment from @Arun pointing out my lack of understanding in data.table soon)

|

mlegge · Accepted Answer · 2023-02-10 15:40:30Z

31

The most elegant way I've found is with mapply:

x[, value := mapply(func.text, f1, f2)]
x
#    f1 f2    value
# 1:  1  3 21.08554
# 2:  2  4 56.59815
# 3:  3  5 151.4132

Or with the purrr package:

x[, value := purrr::pmap_dbl(.(f1, f2), func.text)]

If your situation allows for it, another approach would be to match the arguments names to the column names to use:

library("purrr")

# arguments match the names of the columns, dots collect other 
# columns existing in the data.table
func.text <- function(f1, f2, ...) { return(f1 + exp(f2)) }

# use `set` to modify the data.table by reference
purrr::pmap_dbl(x, func.text) %>%
  data.table::set(x, i = NULL, j = "value", value = .)

print(x)
##    f1 f2     value
## 1:  1  3  21.08554
## 2:  2  4  56.59815
## 3:  3  5 151.41316

edited Feb 10, 2023 at 15:40

answered Apr 13, 2017 at 17:59

mlegge

6,9233 gold badges43 silver badges70 bronze badges

Comments

Cron Merdek · Accepted Answer · 2016-02-05 10:20:16Z

9

We can define rows with .I function.

dt_iris <- data.table(iris)
dt_iris[, ..I := .I]

## Let's define some function
some_fun <- function(dtX) {
    print('hello')
    return(dtX[, Sepal.Length / Sepal.Width])
}

## by row
dt_iris[, some_fun(.SD), by = ..I] # or simply: dt_iris[, some_fun(.SD), by = .I]

## vectorized calculation
some_fun(dt_iris)

edited Feb 5, 2016 at 10:20

answered Sep 24, 2015 at 11:33

Cron Merdek

1,1241 gold badge15 silver badges25 bronze badges

8 Comments

Stéphane Laurent Over a year ago

I am under the impression there was an age it was possible to directly use by=.I in the third component. No ?

Cron Merdek Over a year ago

@StéphaneLaurent sure, it is just to indicate that user sees the data, he applies by on. I have updated post to remove any doubt ;)

Stéphane Laurent Over a year ago

Sorry CronAcronis, maybe my comment is not clear. I mean it was possible to direclty do dt[, y:=somefun(x), by=I] in the past. But it is no possible now. Or maybe my memory is wrong.

Cron Merdek Over a year ago

@StéphaneLaurent I think you meant .I, so you can do dt_iris[, some_fun(.SD), by = .I], with dot.

Davor Josipovic Over a year ago

Note that .I is meant to be used as a j argument in data.table, and not in the by clause. In DT >1.12.4 it doesn't seem to work either. @CronMerdek, can you re-evaluate your answer?

|

teemoleen · Accepted Answer · 2023-05-19 00:05:26Z

0

This is a pretty compact syntax

x[, c := .(Map(func.text, f1, f2))]

answered May 19, 2023 at 0:05

teemoleen

1186 bronze badges

Collectives™ on Stack Overflow

R data.table apply function to rows using columns as arguments

4 Answers 4

11 Comments

Comments

8 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

11 Comments

Comments

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related