Using terra in R to calculate & map (dis)similarity and uniqueness across cells of a large GDM-based raster

Question

I am looking to employ terra to update an analytical workflow (from Mokany et al 2022; Glob Ecol and Biogeo) originally written in R with the raster package. The workflow involves spatial analyses of predicted dissimilarities, established from a generalized dissimilarity model (GDM). The primary data are GDM transformed predictor rasters (outputs of GDM model fitting), assembled in a single multi-layer raster object. Spatial analyses involve using values from the transformed rasters to predict dissimilarities among pairs of grid cells and applying those outputs for answering various spatial questions. The project raster is large (requires 102 gigs to process; via mem_info).

My specific question: what terra-based code is suitable for applying GDM predictions to assess spatial patterns of dissimilarity and uniqueness (i.e., mean similarity between each grid cell and a random sample of locations across the entire study region). For outputs, I am looking for 1) a raster of predicted dissimilarity and 2) a raster of predicted uniqueness.

This question is quite comparable to Employing `terra::` to avoid std::bad_alloc error when extracting values from large SpatRaster stack. I have sought to use relevant components of that helpful answer here, but haven't successfully developed a workable solution.

I have created a reprex and show the initial code chunk I'm working with. That code draws heavily on the stackoverflow answer referenced above; I show where I'm stalled.

Create multilayer sample raster

library(terra)
r <- rast(ncol=100, nrow=100, xmin=-150, xmax=-80, ymin=20, ymax=60,
          nlyr=5, vals=runif(10000*5))
r[[1:3]][10:20] <- NA
r[100:120] <- NA

Initial (partial) attempt at new workflow.

Note, here a sample of values from the full raster is employed because calculating pair-wise dis/similarities among all grid cells is not advisable.

# specify percentage of raster for sampling
n.sub <- ncell(r) * 0.5

# Return matrix of sampled values  
x <- na.omit(spatSample(r, n.sub, "regular", as.df=FALSE)) 

# Compute dissimilarity between grid-cell pairs
sub.dissimilarity <- matrix(0, n.sub, n.sub)

# Flag - if there are na values in the sample, I need to revise `n.sub` in the previous line to match the number of rows after `na.omit`

# Use an arbitrary value to set up loop
gdmRastMod <- list(intercept = 2)

for(i in 1:(n.sub-1)) {
    for(j in (i+1):n.sub) {
        ecol.dist <- sum(abs(x[i, ] - x[j, ]))
        sub.dissimilarity[j,i] <- 1 - exp(-1 * (gdmRastMod$intercept + ecol.dist))
    }
}

At this point, I surmise two functions are needed for calculating: 1) dissimilarity and 2) uniqueness (dis/similarity to the region as a whole) and applying those predictions across the study extent.

UPDATE: following @robert-hijmans helpful solution, I was able to produce a raster of predicted uniqueness across my study extent. The code scaled very well with my project data.

However, when I run the first code chunk (re: dissimilarity across 10 cells) with sample data, the output is a multi-layer raster. I need a single layer raster of dissimilarity (comparable to the single-layer raster of uniqueness produced through the other chunk). I also need to know the best way to extend that across all cells in my study extent (not just the first 10, 100, 1000 etc.) Replacing that '10' with the total # of raster cells kicks an error.

Please edit your code so that you provide a single minimal example that reproduces your issue. I already pointed out that you are using a poorly implemented algorithm that does not scale to a large data set (stackoverflow.com/questions/79343223/…). I gave you a very elaborate answer showing how you can reimplement it; but you seem to ignore that here. — Robert Hijmans
– Robert Hijmans, Commented Feb 17 at 3:21
This is only my 3rd question on stackoverflow, so I'm not accustomed to all conventions. I will look to incorporate the front end of the solution you provided previously. That answer was very helpful and I ran numerous clustering trials, culminating in a successful outcome, over the past couple weeks. Thank you — Sean Basquill
– Sean Basquill, Commented Feb 17 at 13:50
"rasterize" does not work in your code --- but it is an odd thing to do anyhow. You are still using the loop over Trans.env.table. Why? That is a bad idea and I have shown how to avoid that in your previous question. — Robert Hijmans
– Robert Hijmans, Commented Feb 17 at 18:28
rasterize() was carried over from the old code; I checked for a terra alternative and saw that the same function name was used. My question now includes the front-end of the code you provided earlier; I've added comments to show my level of understanding. Still learning as I go. Thank you kindly — Sean Basquill
– Sean Basquill, Commented Feb 17 at 20:38
It is currently not clear what your question is, can you clarify? And can you remove the "legacy code" as it is not clear why that is relevant anymore. It is fundamentally flawed, the problem is not about replacing raster functions with terra functions. That is why I redesigned it for you in the previous question. — Robert Hijmans
– Robert Hijmans, Commented Feb 18 at 1:30

Robert Hijmans · Accepted Answer · 2025-02-19 01:27:18Z

1

You ask for "pairwise-similarities". If you mean pairs of cells, those are in 1-sub.diss.m, for the sample. Do you actually want a raster for that (with ncell(r) layers)?

For the first 10 cells:

dfun <- function(cell) {
  if (is.na(r[cell][1])) return(NULL)
  edist <- sum(r - unlist(r[cell]))
  out <- 1 - exp(-1 * (gdmRastMod$intercept + edist))
  # with large datasets 
  # out <- writeRaster(out, paste0(tempfile(), ".tif"))
  out
}
d <- lapply(1:10, dfun)
names(d) <- 1:10

dr <- rast(d[!sapply(d, is.null)])
dr
#class       : SpatRaster 
#dimensions  : 100, 100, 9  (nrow, ncol, nlyr)
#resolution  : 0.7, 0.4  (x, y)
#extent      : -150, -80, 20, 60  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
#source(s)   : memory
#names       :           1,         2,          3,         4,          5,          6, ... 
#min values  : -0.03850853, 0.1265724, -0.8860728, 0.4260013, -0.5692148, -0.5711362, ... 
#max values  :  0.98196854, 0.9848348,  0.9672524, 0.9900337,  0.9727540,  0.9727206, ...

You also ask for "similarity to the region as a whole", but you do not define what you mean by that (what measure to use). But you could do something like this:

m <- global(r, "mean", na.rm=T)
edist <- sum(r - unlist(m))
dissim <- 1 - exp(-1 * (gdmRastMod$intercept + edist))

dissim
#class       : SpatRaster 
#dimensions  : 100, 100, 1  (nrow, ncol, nlyr)
#resolution  : 0.7, 0.4  (x, y)
#extent      : -150, -80, 20, 60  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
#source(s)   : memory
#name        :        sum 
#min value   : 0.02861172 
#max value   : 0.98313393

answered Feb 19 at 1:27

Robert Hijmans

49.5k4 gold badges62 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sean Basquill Feb 19 at 20:09

That second chunk for producing the dissim raster is lightning fast (about 25 minutes on desktop with my big project data)!

Sean Basquill Jun 17 at 18:19

Back to this after completing separate thesis chapter. @RobertH -- the first code chunk in your answer produced a multi-layer raster, whereas I need a single layer of compositional dissimilarity. In the "update" to my question, I also inquire about an appropriate method for creating such a raster for my entire study extent. Thank you kindly

Sean Basquill Jul 24 at 13:39

When applying global(), the terra package documentation states that raster sampling may be required with custom functions (to avoid memory problems). That implies built-in functions like mean, are not computed with a sample. Is that correct? If there is no sampling, one would surmise that there is no prediction or extrapolation (across the entire raster from sample) with your solution? True? Thank you kindly @Robert. I am describing this routine with words for a thesis chapter.

Collectives™ on Stack Overflow

Using terra in R to calculate & map (dis)similarity and uniqueness across cells of a large GDM-based raster

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related