1

The situation is as follows.

I have two pandas dataframes:

  • df1, which contains a column "p1" with 1895 rows of random numbers ranging from 2.805 to 3.035 (here are the first 20 rows):
         p1
0       2.910
1       2.885
2       2.875
3       2.855
4       2.910
5       2.870
6       2.850
7       2.875
8       2.865
9       2.875
10      2.890
11      2.910
12      2.965
13      2.955
14      2.935
15      2.905
16      2.900
17      2.905
18      2.970
19      2.940
  • df2, which contains two columns, "p2" and "h"
    p2   h
0   2.7 256.88
1   2.8 253.52
2   2.9 250.18
3   3.0 246.86
4   3.1 243.55

The aim is to first loop through all rows in df1 and find the closest element in p2 for each row. e.g. for p1[0] = 2.910, the closest element is p2[2] = 2.9.

  • Then, if these two values are the same, the output for that row is the corresponding value of h
  • otherwise, the output is the average of the previous and subsequent values of h.

Going back to our example, the output for p1[0] should therefore be (h[1]+h[3])/2

I hope this all makes sense, this is my first question on here :). Thanks!

2 Answers 2

1

This is the usage of merge_asof, notice the allow_exact_matches=True is default as True, for example 2.9 nearest is 2.9 in this case

df1=df1.sort_values('p1')
s1=pd.merge_asof(df1,df2,left_on='p1',right_on='p2',direction='backward')
s2=pd.merge_asof(df1,df2,left_on='p1',right_on='p2',direction='forward')
df1['Value']=(s1.h+s2.h)/2
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot, your solution works fine! Do you have any idea how I could include the p1 term in the final equation? e.g. for p1=2.805, the output in 'Value' is (s1.hfg+s2.hfg)/2 + 2.805 ? Thanks
1

Another solution with numpy:

import numpy as np

# Generate some test data
x1 = np.random.randint(0,100,10)
x2 = np.vstack([np.random.randint(0,100,10),np.random.normal(0,1,10)]).T

# Repeat the two vectors
X1 = np.tile(x1,(len(x2),1))
X2 = np.tile(x2[:,0],(len(x1),1))
distance = np.abs(X1 - X2.T)
closest_idx = np.argmin(distance,axis=0)

print(x2[closest_idx,1])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.