Finding nearest element of an array in a particular column of another array

Question

I want to map one numpy array to another one. My frist array has two columns and thousands of rows:

arr_1 = [[20,  0.5],
         [30, 0.75],
         [40,  1.0],
         [50, 1.25],
         [60,  1.5],
         [70, 1.75],
         ...]

The second array can have a different number of rows and columns:

arr_2 = [[1, 0.45],
         [2, 0.57],
         [4, 0.58],
         [1, 1.69],
         [1, 1.51],
         [1, 0.95],
         ...]

I want to compare the values of the second column of arr_2 with the second column of arr_1 to know which row of arr_2 is closer to which row of arr_1. Then I want to copy the first column of arr_1 into arr_2 from the row with the nearest second column.

For example, 0.45 in arr_2 is closest to 0.5, i.e. first row in arr_1. After finding that, I want to copy the first column of that row (which is 20) into arr_2. The final result would look something like:

arr_2_final = [[1, 0.45, 20],
               [2, 0.57, 20],
               [4, 0.58, 20],
               [1, 1.69, 70],
               [1, 1.51, 60],
               [1, 0.95, 40],
               ...]

If you use a numpy array, perhaps you should show arrays instead of lists in your example? — Mad Physicist
– Mad Physicist, Commented Sep 14, 2020 at 16:24

Mad Physicist · Accepted Answer · 2020-09-14 16:23:11Z

1

Looking up lots of items in an array is easiest done when it is sorted. You can delegate most of the work to np.searchsorted. Since we want to find elements in arr_1, it is the only array that needs to be sorted. I suspect that having a sorted arr_2 will speed things up by reducing the size of the search space for every successive element.

First, find the insertion points where arr_2 would end up in arr_1:

indices = np.searchsorted(arr_1[:, 1], arr_2[:, 1])

Now all you have to do is check for cases where the prior element is closer than the current one. There are two corner cases: when index is 0, you have to accept it, and when it is arr_1.size, you have to take the prior.

indices[indices == arr_1.shape[0]] = arr_1.shape[0] - 1
indices[(indices != 0) & (arr_1[indices, 1] - arr_2[:, 1] > arr_2[:, 1] - arr_1[indices - 1, 1])] -= 1

Doing it in this order saves you the trouble of messing with temporary arrays. The first line ensures that the index arr_1[indices, 1] is always valid. Since index -1 is valid, the second line succeeds as well.

The final result is then

np.concatenate((arr_2, arr_1[indices, 0:1]), axis=1)

If arr_1 is not already sorted, you can do the following:

arr_1 = arr1[np.argsort(arr_1[:, 1]), :]

A quick benchmark shows that on my very moderately powered machine, this approach takes ~300ms for arr_1.shape = (500000, 2) and arr_2.shape = (300000, 2).

edited Sep 14, 2020 at 16:23

answered Sep 14, 2020 at 16:02

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Link_tester Over a year ago

Dear @MadPhysicist, your solution was fantastic. But, now I notices that my real data are a little bit complicated. I said that I want to compare just two columns of two array but now I see it is a littile bit different. I have to compare two cordinates and find the closest ones. It is a little bit complicated and I think it is better to open another issue for that.

Link_tester Over a year ago

Dear @MadPhysicist, I really appreciate your help. I did not knew anything about upvoting. Definitely, I will do it.

Mad Physicist Over a year ago

@Ali_d. No worries. Welcome to the site. FYI: stackoverflow.com/help/why-vote

Mad Physicist · Accepted Answer · 2020-09-14 15:37:27Z

0

I would probably do it this way:

import numpy as np

arr_1= [[20, 0.5], [30, 0.75], [40, 1], [50, 1.25], [60, 1.5], [70, 1.75]]
arr_2= [[1, 0.45], [2, 0.57], [4, 0.58], [1, 1.69], [1, 1.51], [1, 0.95]]

arr_2_np = np.array(arr_2)[:,1]

for row in arr_1:
  idx = np.argmin(np.abs(arr_2_np - row[1]))
  arr_2[idx].append(row[0])

print(arr_2)

edited Sep 14, 2020 at 15:37

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

answered Sep 14, 2020 at 11:22

Marcus

4487 silver badges15 bronze badges

6 Comments

Link_tester Over a year ago

Thanks for the reply. But when I run it, I saw the following result: [[1, 0.45, 20], [2, 0.57], [4, 0.58, 30], [1, 1.69, 70], [1, 1.51, 50, 60], [1, 0.95, 40]]. As you see, it has not added any value for the second row of arr_2 but added two values to the fifth row.

Marcus Over a year ago

@Ali_d Ok, so you only want to add a single value to each row in arr_2? How should this selection be carried out? I think you need to clarify how you want the selection to be performed to be able to get a better answer. The solution above appends the value of each row in arr_1 to the best matching row in arr_2 regardless if there already are a value appended or not.

Link_tester Over a year ago

Yes, I want my code to check the difference between the second column of each row of arr_2 with all values of the second column of arr_1, then find the least difference and add the first column of that row to the arr_2.

Marcus Over a year ago

@MadPhysicist Given the clarification given by OP that makes the problem somewhat more complex I don't see a solution without looping unless there exists some function that finds the minimum distance between numbers in two lists of different lengths. Would you mind giving a pointer to where one might look for such a solution as ordinary vector algebra isn't compatible with arrays of different lengths?

Mad Physicist Over a year ago

@Marcus. There are plenty of numpy functions for working with arrays of different sizes. I'll post an answer shortly

|

Collectives™ on Stack Overflow

Finding nearest element of an array in a particular column of another array

2 Answers 2

3 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related