0

I have a geopandas dataframe that contains points at arbitrary lat/lon points. I would like to create a raster dataset so that each point in the regular grid contains the number of elements of the dataframe contained in a circle of a ccertain radius. The difficulty is that this number should contain only unique numbers respect to a certain column.

bb is my dataframe,radius is the distance from the center of the cell for whihc I want the numbers.

I am currently doing this using this code:

   mappa = numpy.full((nrows, ncols), 0) 
   for col in range(ncols):
        for row in range(nrows):
            lat=row*cellsize+bbox[1]
            lon=col*cellsize+bbox[0]   
            ff=bb.cx[lon-radius:lon+radius,lat-radius:lat+radius]  #  those give the rows falling in the limitss
            mappa[row,col]=len(ff['flkey'].unique())/period     # with this i count only unique points respect to a column flkey

The dataframe bb contains a huge number of points so that the method is extremely slow.

I would like to improve performing the check only in points in which there are sufficient numbers as most of the points in the map there are no points. So I would like to generate a preliminary density map and then analyse only the lat/lon points in which the density is significant.

Is there a point density funciotn similar to the ESRI function PointDensity_sa ?

Thanks

5
  • Could show what ff['flkey'] looms like, because it's unclear what you mean with "The difficulty is that this number should contain only unique numbers respect to a certain column." Kernel density approximation is the general term to create a density grid. en.wikipedia.org/wiki/Kernel_density_estimation . Scikit learn has an implementation scikit-learn.org/stable/modules/generated/… which avoids your loop by using trees. Commented Oct 20, 2024 at 20:50
  • The ff is a dataframe containing million of records, each one at lat/lon and with a parameter flkey that is a string. I want to create a raster in which at each cell, i want to have the number of records in a certain radius around the cell center. The problem is that I want unique records respect to flkey and not all the records in the circle. The normall density estimation cannot compute the unique records but count all the records around Commented Oct 23, 2024 at 7:36
  • Posted a solution. I suggest to reword the title to reflect the actual difficult coding part for you (raster of number of unique values) Commented Oct 23, 2024 at 18:18
  • What do you mean by huge number? Commented Nov 4, 2024 at 17:36
  • Million of records Commented Nov 5, 2024 at 18:31

1 Answer 1

1

The following uses sklearn.neighbors.KDTree to make fast queries from cell midpoints to data points. It yields the number of unique values present from a certain radius to the grid cell midpoints.

import numpy as np
from sklearn.neighbors import KDTree

n = 100_00
cell_size = 0.01
radius = 0.05


coords = np.random.random_sample((n, 2))
values = np.random.randint(0, n//10, n)

x_axis = np.linspace(0, 1, int(1 / cell_size))
y_axis = np.linspace(0, 1, int(1 / cell_size))
xv, yv = np.meshgrid(x_axis, y_axis)
tree = KDTree(coords)

results = tree.query_radius(list(zip(xv.ravel(), yv.ravel())), radius)
unique_results = [len(np.unique(values[indices])) for indices in results]
grid_values = np.array(unique_results).reshape((len(x_axis), len(x_axis)))

This yields an raster like this: enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.