0

I am running into issues with proper data extraction using Terra. It appears to be obtaining the mean and I am not seeing an option to select the most frequent or largest coverage for a selected area.

I have two raster stack datasets I am working with for a given area.

Dataset1: is irrelevant to the task at hand aside from the fact that it was used to create the extent that will be used to extract data from Dataset2. Dataset 1 consisted of 30x30 meter pixels that were aggregated into groups of 9 to define a zone/extent along a grid. This grid is what I will be working with to extract Dataset2 values.

Dataset2:Also consists of 30x30 meter pixels but these pixels do not line up with Dataset1 pixels. Everything is projected properly, this seems to just be how the data lines up.

For each grid I obtained the "mean" value of Dataset1. This part worked well. Dataset2, however is Land cover data, so even though it's represented by a "number" in the data, the number is not mathematical- it just represents specific value, for example, 42 (forest) and 71(grassland) cannot be averaged to create a number that makes any sense. I was extracting the data and noticed it gave me 9 "values" for each grid for each raster, and that despite the data pixels not lining up with the grid it was still slicing them into 9 30x30m sections along the grid-line. I didn't tell it to do this with the code. I used: LCValues<- extract(LC, Grid)

where "LC" is my spatraster stack and "Grid" is my vector polygon grid data layer.

I presumed it would give me each value in the grid and that I could obtain how much area each value covered to select the value with the most weight, but what happened with the 30x30 subdivided areas was that it was giving whole number where it could (when the value was constant across the 30x30m area) but sometimes the value obtained for each 30x30m was a weighted average of the 30x30m section, for example if it was 75% grassland and 25% forested, my value would be closer to the grassland value (60s) but if it was 25% forested and 75% grassland it would be in the upper 40s. I do not need the 30x30 meter "sub grids" at all, but at the very least I would like it to at least choose one of the values in the grid not average them because that data is un-useable to me. I would prefer to extract the value that is most constant (covers the greatest area) across the 90x90m grid area if possible.

I have tried adding exact=TRUE and weights=TRUE but this doesn't work because of the weird way it's automatically subdividing into multiple 30x30 m sub-grids.

The simplest solution would be to make my grid follow along the data but this isn't possible because I need to be able to merge this with Dataset1 eventually for further analysis.

I also tried using a spatial join (so I could use intersect and largest=TRUE) but that isn't possible with the SpatRaster. Is there another option I am missing? And does anyone know where these 9 30x30 meter grids within my grid are coming from?

9
  • 1
    It would be great if you could make this reproducible. Maybe you could even make use of some built-in datasets shipped with terra so that everyone would be able to comprehend what exactly you're trying to achieve. By the way, having a look at ?terra::extract, there does not seem to be a method for signature 'SpatRaster,SpatRaster' documented. Maybe you should also have a look at terra::aggregate() with an appropriate fact and fun = "modal"? Commented Dec 15, 2024 at 18:27
  • I have been trying to make it reproducible by creating a SpatRaster using min/max coordinates but have gotten stuck trying to make a similar grid. The dataset I am trying to use to extract is actually an sf multipolygon because that was how I imported it from ArcGIS Pro. I am realizing between your comment that maybe I need to turn it into a SpatVector? I will try that next and look into aggregate, I just need to make sure it aggregates correctly along the appropriate Grid as indicated in the Grid dataset somehow, rather than aggregating along the lines of it's own value. Commented Dec 15, 2024 at 20:55
  • If y has been an sf object from the beginning and terra::extract() has been returning a result without errors, I assume there is some implicit conversion using terra::vect(). Commented Dec 15, 2024 at 21:26
  • Perhaps this and consider ?as.factor, where f <- as.factor('42'), as.numeric(levels(f))[f] [1] 42, same for '71'. , i.e. convert the 'finding' to factors, or them to 'numeric'. Which all probably sounded clear as mud, but land use is categorical, hence factors, and you want to find by area predominant category. Commented Dec 16, 2024 at 1:02
  • I have imported the Grid dataset (it was originally created in ArcGIS) as a SpatVector instead of an sf object and now I am getting 16 values for each GridID instead of the 9 I was getting as an sf object. But it is still returning the mean of some of the values. I have created a dataset that mimics the conditions in R by creating a grid from one raster and using it on a raster of another resolution but it is not posing the same issues, it is obtaining each value without averaging them or creating the "sub grids". This may be an issue with dataset itself somehow. Commented Dec 16, 2024 at 1:26

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.