How to find the closest element in another column for each element in a column?

Question

The situation is as follows.

I have two pandas dataframes:

df1, which contains a column "p1" with 1895 rows of random numbers ranging from 2.805 to 3.035 (here are the first 20 rows):

         p1
0       2.910
1       2.885
2       2.875
3       2.855
4       2.910
5       2.870
6       2.850
7       2.875
8       2.865
9       2.875
10      2.890
11      2.910
12      2.965
13      2.955
14      2.935
15      2.905
16      2.900
17      2.905
18      2.970
19      2.940

df2, which contains two columns, "p2" and "h"

    p2   h
0   2.7 256.88
1   2.8 253.52
2   2.9 250.18
3   3.0 246.86
4   3.1 243.55

The aim is to first loop through all rows in df1 and find the closest element in p2 for each row. e.g. for p1[0] = 2.910, the closest element is p2[2] = 2.9.

Then, if these two values are the same, the output for that row is the corresponding value of h
otherwise, the output is the average of the previous and subsequent values of h.

Going back to our example, the output for p1[0] should therefore be (h[1]+h[3])/2

I hope this all makes sense, this is my first question on here :). Thanks!

BENY · Accepted Answer · 2019-06-23 22:53:43Z

1

This is the usage of merge_asof, notice the allow_exact_matches=True is default as True, for example 2.9 nearest is 2.9 in this case

df1=df1.sort_values('p1')
s1=pd.merge_asof(df1,df2,left_on='p1',right_on='p2',direction='backward')
s2=pd.merge_asof(df1,df2,left_on='p1',right_on='p2',direction='forward')
df1['Value']=(s1.h+s2.h)/2

answered Jun 23, 2019 at 22:53

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

rad189 Over a year ago

Thanks a lot, your solution works fine! Do you have any idea how I could include the p1 term in the final equation? e.g. for p1=2.805, the output in 'Value' is (s1.hfg+s2.hfg)/2 + 2.805 ? Thanks

Nakor · Accepted Answer · 2019-06-23 22:58:21Z

1

Another solution with numpy:

import numpy as np

# Generate some test data
x1 = np.random.randint(0,100,10)
x2 = np.vstack([np.random.randint(0,100,10),np.random.normal(0,1,10)]).T

# Repeat the two vectors
X1 = np.tile(x1,(len(x2),1))
X2 = np.tile(x2[:,0],(len(x1),1))
distance = np.abs(X1 - X2.T)
closest_idx = np.argmin(distance,axis=0)

print(x2[closest_idx,1])

answered Jun 23, 2019 at 22:58

Nakor

1,5142 gold badges14 silver badges25 bronze badges

Collectives™ on Stack Overflow

How to find the closest element in another column for each element in a column?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related