Find nearest neighbors of a numpy array in list of numpy arrays using euclidian distance

Question

I have a n-dimensional vector and I want to find its k nearest neighbors in a list of n-dimensional vectors using euclidian distance.

I wrote the following code (with k=10) which works but runs too slowly and I was wondering if there was a more optimal solution.

def nearest_neighbors(value, array, nbr_neighbors=1):
    return np.argsort(np.array([np.linalg.norm(value-x) for x in array]))[:nbr_neighbors]

sascha · Accepted Answer · 2017-08-17 18:07:57Z

5

Use scipy's kd-tree.

A small example is available here.

Many people seem to complain about the performance and recommend sklearn's implementation though (links sklearn.neighbors, which is using this data-structure internally)!

answered Aug 17, 2017 at 18:07

sascha

33.7k6 gold badges80 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

teekarna Over a year ago

SciPy has cKDTree which is significantly faster than the KDTree.

Matti Wens Over a year ago

Updated link: cKDtree

realityChemist Over a year ago

Using cKDtree instead of KDtree is no longer recommended by SciPy except for backwards compatibility, as they are now identical. From the cKDtree documentation page linked above: "Prior to SciPy v1.6.0, cKDTree had better performance and slightly different functionality but now the two names exist only for backward-compatibility reasons. If compatibility with SciPy < 1.6 is not a concern, prefer KDTree."

Bastien Beurier · Accepted Answer · 2017-09-01 16:51:33Z

3

As sascha said, I ended up using the scipy library (but the NearestNeighbors method) which brought down the computation time from 50 hours to 36 minutes. It is the kind of computation I should not have tried to reimplement myself as dedicated libraries are much more optimized for this.

The NearestNeighbors method also allows you to pass in a list of values and returns the k nearest neighbors for each value.

Final code was:

def nearest_neighbors(values, all_values, nbr_neighbors=10):
    nn = NearestNeighbors(nbr_neighbors, metric='cosine', algorithm='brute').fit(all_values)
    dists, idxs = nn.kneighbors(values)

edited Sep 1, 2017 at 16:51

answered Aug 17, 2017 at 18:31

Bastien Beurier

9401 gold badge10 silver badges14 bronze badges

2 Comments

dawg Over a year ago

You should select HIS answer as the answer and add this to your post as an end edit.

FindOutIslamNow Over a year ago

This only works for 2d data. higher dimensions won't work

Zhenlei Cai · Accepted Answer · 2021-10-09 11:45:55Z

-2

I would try using the pdist function of scipy to find the pairwise distances by brute force : https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html

It should be quite fast as pdist is highly optimized. Then for each element pick the k nearest.

answered Oct 9, 2021 at 11:45

Zhenlei Cai

1231 silver badge7 bronze badges

Collectives™ on Stack Overflow

Find nearest neighbors of a numpy array in list of numpy arrays using euclidian distance

3 Answers 3

3 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related