2

I want to map one numpy array to another one. My frist array has two columns and thousands of rows:

arr_1 = [[20,  0.5],
         [30, 0.75],
         [40,  1.0],
         [50, 1.25],
         [60,  1.5],
         [70, 1.75],
         ...]

The second array can have a different number of rows and columns:

arr_2 = [[1, 0.45],
         [2, 0.57],
         [4, 0.58],
         [1, 1.69],
         [1, 1.51],
         [1, 0.95],
         ...]

I want to compare the values of the second column of arr_2 with the second column of arr_1 to know which row of arr_2 is closer to which row of arr_1. Then I want to copy the first column of arr_1 into arr_2 from the row with the nearest second column.

For example, 0.45 in arr_2 is closest to 0.5, i.e. first row in arr_1. After finding that, I want to copy the first column of that row (which is 20) into arr_2. The final result would look something like:

arr_2_final = [[1, 0.45, 20],
               [2, 0.57, 20],
               [4, 0.58, 20],
               [1, 1.69, 70],
               [1, 1.51, 60],
               [1, 0.95, 40],
               ...]
3
  • Are the values in the second column sorted? Commented Sep 14, 2020 at 13:16
  • If you use a numpy array, perhaps you should show arrays instead of lists in your example? Commented Sep 14, 2020 at 16:24
  • Dear @MadPhysicist, the second column is sorted. Commented Sep 15, 2020 at 7:33

2 Answers 2

1

Looking up lots of items in an array is easiest done when it is sorted. You can delegate most of the work to np.searchsorted. Since we want to find elements in arr_1, it is the only array that needs to be sorted. I suspect that having a sorted arr_2 will speed things up by reducing the size of the search space for every successive element.

First, find the insertion points where arr_2 would end up in arr_1:

indices = np.searchsorted(arr_1[:, 1], arr_2[:, 1])

Now all you have to do is check for cases where the prior element is closer than the current one. There are two corner cases: when index is 0, you have to accept it, and when it is arr_1.size, you have to take the prior.

indices[indices == arr_1.shape[0]] = arr_1.shape[0] - 1
indices[(indices != 0) & (arr_1[indices, 1] - arr_2[:, 1] > arr_2[:, 1] - arr_1[indices - 1, 1])] -= 1

Doing it in this order saves you the trouble of messing with temporary arrays. The first line ensures that the index arr_1[indices, 1] is always valid. Since index -1 is valid, the second line succeeds as well.

The final result is then

np.concatenate((arr_2, arr_1[indices, 0:1]), axis=1)

If arr_1 is not already sorted, you can do the following:

arr_1 = arr1[np.argsort(arr_1[:, 1]), :]

A quick benchmark shows that on my very moderately powered machine, this approach takes ~300ms for arr_1.shape = (500000, 2) and arr_2.shape = (300000, 2).

Sign up to request clarification or add additional context in comments.

3 Comments

Dear @MadPhysicist, your solution was fantastic. But, now I notices that my real data are a little bit complicated. I said that I want to compare just two columns of two array but now I see it is a littile bit different. I have to compare two cordinates and find the closest ones. It is a little bit complicated and I think it is better to open another issue for that.
Dear @MadPhysicist, I really appreciate your help. I did not knew anything about upvoting. Definitely, I will do it.
@Ali_d. No worries. Welcome to the site. FYI: stackoverflow.com/help/why-vote
0

I would probably do it this way:

import numpy as np

arr_1= [[20, 0.5], [30, 0.75], [40, 1], [50, 1.25], [60, 1.5], [70, 1.75]]
arr_2= [[1, 0.45], [2, 0.57], [4, 0.58], [1, 1.69], [1, 1.51], [1, 0.95]]

arr_2_np = np.array(arr_2)[:,1]

for row in arr_1:
  idx = np.argmin(np.abs(arr_2_np - row[1]))
  arr_2[idx].append(row[0])

print(arr_2)

6 Comments

Thanks for the reply. But when I run it, I saw the following result: [[1, 0.45, 20], [2, 0.57], [4, 0.58, 30], [1, 1.69, 70], [1, 1.51, 50, 60], [1, 0.95, 40]]. As you see, it has not added any value for the second row of arr_2 but added two values to the fifth row.
@Ali_d Ok, so you only want to add a single value to each row in arr_2? How should this selection be carried out? I think you need to clarify how you want the selection to be performed to be able to get a better answer. The solution above appends the value of each row in arr_1 to the best matching row in arr_2 regardless if there already are a value appended or not.
Yes, I want my code to check the difference between the second column of each row of arr_2 with all values of the second column of arr_1, then find the least difference and add the first column of that row to the arr_2.
@MadPhysicist Given the clarification given by OP that makes the problem somewhat more complex I don't see a solution without looping unless there exists some function that finds the minimum distance between numbers in two lists of different lengths. Would you mind giving a pointer to where one might look for such a solution as ordinary vector algebra isn't compatible with arrays of different lengths?
@Marcus. There are plenty of numpy functions for working with arrays of different sizes. I'll post an answer shortly
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.