4

I have a n-dimensional vector and I want to find its k nearest neighbors in a list of n-dimensional vectors using euclidian distance.

I wrote the following code (with k=10) which works but runs too slowly and I was wondering if there was a more optimal solution.

def nearest_neighbors(value, array, nbr_neighbors=1):
    return np.argsort(np.array([np.linalg.norm(value-x) for x in array]))[:nbr_neighbors]

3 Answers 3

5

Use scipy's kd-tree.

A small example is available here.

Many people seem to complain about the performance and recommend sklearn's implementation though (links sklearn.neighbors, which is using this data-structure internally)!

Sign up to request clarification or add additional context in comments.

3 Comments

SciPy has cKDTree which is significantly faster than the KDTree.
Updated link: cKDtree
Using cKDtree instead of KDtree is no longer recommended by SciPy except for backwards compatibility, as they are now identical. From the cKDtree documentation page linked above: "Prior to SciPy v1.6.0, cKDTree had better performance and slightly different functionality but now the two names exist only for backward-compatibility reasons. If compatibility with SciPy < 1.6 is not a concern, prefer KDTree."
3

As sascha said, I ended up using the scipy library (but the NearestNeighbors method) which brought down the computation time from 50 hours to 36 minutes. It is the kind of computation I should not have tried to reimplement myself as dedicated libraries are much more optimized for this.

The NearestNeighbors method also allows you to pass in a list of values and returns the k nearest neighbors for each value.

Final code was:

def nearest_neighbors(values, all_values, nbr_neighbors=10):
    nn = NearestNeighbors(nbr_neighbors, metric='cosine', algorithm='brute').fit(all_values)
    dists, idxs = nn.kneighbors(values)

2 Comments

You should select HIS answer as the answer and add this to your post as an end edit.
This only works for 2d data. higher dimensions won't work
-2

I would try using the pdist function of scipy to find the pairwise distances by brute force : https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html

It should be quite fast as pdist is highly optimized. Then for each element pick the k nearest.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.