1

I'm trying to run a bilinear terra::extract of > 2 700 000 point coordinates on a SpatRaster stack with many layers (2880, although each with 100 cells only) and getting "Error: std::bad_alloc", probably because I run out of memory.

I've I thought about processing this sequentially using a lapply on a list version of my stack (something like as.list(stk)) but it will take ages to process, I think (based on a benchmarking exercise with the first two layers), so I was hoping for another solution.

I can probably split the SpatRaster into sections, but I thought terra would do that automagically, but processing by chunks

Here's a reprex:

library(terra)
library(data.table)

r <- rast(nrow = 10, ncol = 11, ext = c(-130.3125, -109.6875, 45.625, 58.125), resolution = c(1.875, 1.25),  crs = "EPSG:4326")
r[] <- runif(n = ncell(r))
stk <- eval(parse(text = paste0("rast(list(", paste(rep("r", 2880), collapse = ", "), "))")))

xyz <- data.table(x = sample(seq(-130.3125, -109.6875, by = 0.0001), size = 2785518, replace = TRUE),
                  y = sample(seq(45.625, 58.1255, by = 0.00001), size = 2785518, replace = TRUE))

tester <- extract(stk, xyz, method = "bilinear") 
# Error: std::bad_alloc

I've trying setting terraOptions(memmax = 0.1, memmin = 0.1) to see if that increased the number of processing chunks (or triggered it), but it didn't and I get the same error.

> mem_info(stk)

------------------------
Memory (GB) 
------------------------
check threshold : 1 (memmin)
available       : 23.98
allowed (60%)   : 14.39
needed (n=1)    : 0
------------------------
proc in memory  : TRUE
nr chunks       : 1
------------------------
> terraOptions(memmax = 0.1, memmin = 0.1)
> mem_info(stk)

------------------------
Memory (GB) 
------------------------
check threshold : 0.1 (memmin)
available       : 0.1 (memmax)
allowed (60%)   : 0.06
needed (n=1)    : 0
------------------------
proc in memory  : TRUE
nr chunks       : 1
------------------------

This issue is potentially related to operation on a very large raster in terra causes std::bad_alloc but I couldn't find a solution there

UPDATE: Following @robert-hijmans 's second suggestion here's a solution that uses extract(..., bilinear = TRUE) and avoids extracting the same cell value more than once:

library(terra)
library(data.table)

r <- rast(nrow = 10, ncol = 11, ext = c(-130.3125, -109.6875, 45.625, 58.125), resolution = c(1.875, 1.25),  crs = "EPSG:4326")
r[] <- runif(n = ncell(r))
stk <- eval(parse(text = paste0("rast(list(", paste(rep("r", 2880), collapse = ", "), "))")))

xyz <- data.table(x = sample(seq(-130.3125, -109.6875, by = 0.0001), size = 2785518, replace = TRUE),
                  y = sample(seq(45.625, 58.1255, by = 0.00001), size = 2785518, replace = TRUE))

## get raster cell coordinates for the selected points:
point2CellCoords <- data.table(cell = cellFromXY(stk, xyz), xyz, ID = 1:nrow(xyz))

## now make a vector of unique cell coordinates
uniqueCoords <- data.table(cell = point2CellCoords$cell, xyFromCell(stk, point2CellCoords$cell)) |>
  na.omit() |>
  unique()

## extract for unique locations and bind back to all points.
tester <- extract(stk, uniqueCoords[, .(x, y)], method = "bilinear") |>
  as.data.table()
tester$cell <- uniqueCoords$cell

result <- tester[point2CellCoords, on = "cell"]

SECOND UPDATE: it seems the suggested solution above does not work as I expected. Doing the extraction for a single layer shows that the bilinear extraction of the unique raster cell values is not the same as doing it for all point coordinates:

## extract for unique locations and bind back to all points.
tester <- extract(stk[[1]], uniqueCoords[, .(x, y)], method = "bilinear") |>
  as.data.table()
tester$cell <- uniqueCoords$cell
tester[, ID := NULL]

result <- tester[point2CellCoords[, .(ID, cell)], on = "cell"]
gc(reset = TRUE)

result1 <- extract(stk[[1]], xyz, method = "bilinear") |>
  as.data.table()

result1[result, on = "ID"]
              ID     lyr.1    i.lyr.1  cell
           <int>     <num>      <num> <num>
      1:       1 0.6067441 0.46629368    53
      2:       2 0.3960616 0.41555004    89
      3:       3 0.6194866 0.46629368    53
      4:       4 0.5465888 0.43061609    71
      5:       5 0.6221420 0.73741642    23
     ---                                   
2785514: 2785514 0.5929927 0.58885241     5
2785515: 2785515 0.4087715 0.09021229    74
2785516: 2785516 0.2594263 0.14229051   103
2785517: 2785517 0.8452475 0.99541315     3
2785518: 2785518 0.5008527 0.79593787    73

Perhaps having different number points intersecting each raster cell weights the final interpolated values in some way?

THIRD UPDATE: @robert-hijman's first solution works as expected, although in my case I still run out of memory when rbinding the tables. Not a terra issue though.

1 Answer 1

3

You can do the extraction in chunks. That is, do not split the rasters, but rather do the extraction for subsets of the points (rows of xyz).

For example

i <- ceiling(1:nrow(xyz) / 500000)
u <- unique(i)
tester <- lapply(u, extract(stk, xyz[u==i, ], method = "bilinear") )
tester <- do.call(rbind, tester)

Since you have 2,700,000 points and only 100 cells you could do the below to avoid memory limitations. With raster r and points p you could do

cells <- cellFromXY(r, p)
v <- values(r)
result <- v[cells, ]  

But since you do extract with "method=bilinear" that may not apply. In that case it could still be useful to do only extract the values for the unique coordinates, and then merge the results with the original points.

Sign up to request clarification or add additional context in comments.

6 Comments

I hadn't thought of that -- I'll give it a try ASAP. One question, though: why isn't terra increasing the number of processing chunks? Did I completely misunderstand that those are and how they are adjusted?
processing chunks normally refer to raster data. Reading raster data in chunks does not apply here. The raster data is only read for the point locations. The issue is that the size of the output data (2700000 * 2800 = 7.56e+09 values) is too large for a contiguous C++ object on your computer. Chunking the point data might help.
The second suggestion worked really well in this case, thanks! I've updated the post with a proposed implementation of it
I was too eager to accept your proposed solution. It seems that extracting values for unique coordintes and merging back to the original points yields different results than direct extraction using all points (see the second update on the original post). Perhaps I'm doing something wrong, but if not I wonder if this is related to some raster cell values having more "weight" if they intersect with more points (which happens in the reprex above)?
you can write the output to a csv file, using append=TRUE
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.