I'm trying to run a bilinear terra::extract of > 2 700 000 point coordinates on a SpatRaster stack with many layers (2880, although each with 100 cells only) and getting "Error: std::bad_alloc", probably because I run out of memory.
I've I thought about processing this sequentially using a lapply on a list version of my stack (something like as.list(stk)) but it will take ages to process, I think (based on a benchmarking exercise with the first two layers), so I was hoping for another solution.
I can probably split the SpatRaster into sections, but I thought terra would do that automagically, but processing by chunks
Here's a reprex:
library(terra)
library(data.table)
r <- rast(nrow = 10, ncol = 11, ext = c(-130.3125, -109.6875, 45.625, 58.125), resolution = c(1.875, 1.25), crs = "EPSG:4326")
r[] <- runif(n = ncell(r))
stk <- eval(parse(text = paste0("rast(list(", paste(rep("r", 2880), collapse = ", "), "))")))
xyz <- data.table(x = sample(seq(-130.3125, -109.6875, by = 0.0001), size = 2785518, replace = TRUE),
y = sample(seq(45.625, 58.1255, by = 0.00001), size = 2785518, replace = TRUE))
tester <- extract(stk, xyz, method = "bilinear")
# Error: std::bad_alloc
I've trying setting terraOptions(memmax = 0.1, memmin = 0.1) to see if that increased the number of processing chunks (or triggered it), but it didn't and I get the same error.
> mem_info(stk)
------------------------
Memory (GB)
------------------------
check threshold : 1 (memmin)
available : 23.98
allowed (60%) : 14.39
needed (n=1) : 0
------------------------
proc in memory : TRUE
nr chunks : 1
------------------------
> terraOptions(memmax = 0.1, memmin = 0.1)
> mem_info(stk)
------------------------
Memory (GB)
------------------------
check threshold : 0.1 (memmin)
available : 0.1 (memmax)
allowed (60%) : 0.06
needed (n=1) : 0
------------------------
proc in memory : TRUE
nr chunks : 1
------------------------
This issue is potentially related to operation on a very large raster in terra causes std::bad_alloc but I couldn't find a solution there
UPDATE:
Following @robert-hijmans 's second suggestion here's a solution that uses extract(..., bilinear = TRUE) and avoids extracting the same cell value more than once:
library(terra)
library(data.table)
r <- rast(nrow = 10, ncol = 11, ext = c(-130.3125, -109.6875, 45.625, 58.125), resolution = c(1.875, 1.25), crs = "EPSG:4326")
r[] <- runif(n = ncell(r))
stk <- eval(parse(text = paste0("rast(list(", paste(rep("r", 2880), collapse = ", "), "))")))
xyz <- data.table(x = sample(seq(-130.3125, -109.6875, by = 0.0001), size = 2785518, replace = TRUE),
y = sample(seq(45.625, 58.1255, by = 0.00001), size = 2785518, replace = TRUE))
## get raster cell coordinates for the selected points:
point2CellCoords <- data.table(cell = cellFromXY(stk, xyz), xyz, ID = 1:nrow(xyz))
## now make a vector of unique cell coordinates
uniqueCoords <- data.table(cell = point2CellCoords$cell, xyFromCell(stk, point2CellCoords$cell)) |>
na.omit() |>
unique()
## extract for unique locations and bind back to all points.
tester <- extract(stk, uniqueCoords[, .(x, y)], method = "bilinear") |>
as.data.table()
tester$cell <- uniqueCoords$cell
result <- tester[point2CellCoords, on = "cell"]
SECOND UPDATE: it seems the suggested solution above does not work as I expected. Doing the extraction for a single layer shows that the bilinear extraction of the unique raster cell values is not the same as doing it for all point coordinates:
## extract for unique locations and bind back to all points.
tester <- extract(stk[[1]], uniqueCoords[, .(x, y)], method = "bilinear") |>
as.data.table()
tester$cell <- uniqueCoords$cell
tester[, ID := NULL]
result <- tester[point2CellCoords[, .(ID, cell)], on = "cell"]
gc(reset = TRUE)
result1 <- extract(stk[[1]], xyz, method = "bilinear") |>
as.data.table()
result1[result, on = "ID"]
ID lyr.1 i.lyr.1 cell
<int> <num> <num> <num>
1: 1 0.6067441 0.46629368 53
2: 2 0.3960616 0.41555004 89
3: 3 0.6194866 0.46629368 53
4: 4 0.5465888 0.43061609 71
5: 5 0.6221420 0.73741642 23
---
2785514: 2785514 0.5929927 0.58885241 5
2785515: 2785515 0.4087715 0.09021229 74
2785516: 2785516 0.2594263 0.14229051 103
2785517: 2785517 0.8452475 0.99541315 3
2785518: 2785518 0.5008527 0.79593787 73
Perhaps having different number points intersecting each raster cell weights the final interpolated values in some way?
THIRD UPDATE:
@robert-hijman's first solution works as expected, although in my case I still run out of memory when rbinding the tables. Not a terra issue though.