2

So my question here is how can I add data in new column to dataframe based on conditions from another dataframe. It is kinda difficult to say it so I am giving an example here

df1

columns  a   b  c
         0   10  1
         10  15  3
         15  20  5


df2
columns  d      e  
         3.3   10   
         5.5   20
         14.5  11
         17.2  5
   

What I want to do here is to add another column f to df2, and its value is from df1 such that if d[i] is between a[j] and b[j], then copy the value c[j] to the new column f[i] in df2. for example: d[1] = 5.5 so 0< 5.5< 10 hence, the value of f[1] = c[0] = 1

the final results should look like

df2
columns  d      e    f
         3.3   10    1 
         5.5   20    1
         14.5  11    3
         17.2  5     5
   

Any help is greatly appreciated!

Regards,

Steve

3
  • so i can be any number in range(len(df2)), j can be any number in range(len(df1)). i and j do not need to be same! Commented Jan 27, 2023 at 22:23
  • no it is not. n Commented Jan 27, 2023 at 22:28
  • so let's say i == 2, then d[2] == 14.5. so the range of 14.5 falls into 10 to 15, so j == 1, c(j) == 3, therefore, f[i] =3 because c[j]= =3 Commented Jan 27, 2023 at 22:34

5 Answers 5

4

Assuming non-overlapping intervals in df1 a and b, you can use pd.cut with a pd.IntervalIndex:

import pandas as pd

# Your dfs here
df1 = pd.read_clipboard()
df2 = pd.read_clipboard()

idx = pd.IntervalIndex.from_arrays(df1["a"], df1["b"])
mapping = df1["c"].set_axis(idx)

df2["f"] = pd.cut(df2["d"], idx).map(mapping)

df2:

      d   e  f
0   3.3  10  1
1   5.5  20  1
2  14.5  11  3
3  17.2   5  5
Sign up to request clarification or add additional context in comments.

Comments

2

if you do not have overlapping intervals, the pd.IntervalIndex accepted solution is a perfect fit.

Another option is with conditional_join from pyjanitor, which can also handle overlapping intervals:

# pip install pyjanitor
import pandas as pd
import janitor
(df2
.conditional_join(
    # types have to be same
    # for columns to be compared
    df1.astype({"a":float, "b":float}), 
    ('d', 'a', '>='), 
    ('d', 'b','<='), 
    # depending on the data size,
    # numba may offer more performance
    use_numba=False,
    right_columns = {'c':'f'})
)
      d   e  f
0   3.3  10  1
1   5.5  20  1
2  14.5  11  3
3  17.2   5  5

Comments

1

You could use:

result = []
for item in df2['d']:
    for row in df1.iterrows():
        if row[1]['a'] <= item <= row[1]['b']:
            val = (row[1]['c'])
            break
        else:
            val = None
    result.append(val)
            
df2['f'] = result

print(df2)

Comments

1
import pandas as pd
df1 = pd.DataFrame({'a':[0,10,15],'b':[10,15,20],'c':[1,3,5]})
df2 = pd.DataFrame({'d':[3.3,5.5,9.5,17.2],'e':[10,20,11,5]})
df2['f']=0
for i in range(df2.shape[0]):
    for j in range(df1.shape[0]):
        if df2.d[i]>=df1.a[j] and df2.d[i]<=df1.b[j]:
            df2.f[i]=df1.c[j]
df2

Comments

1

What about this option ?

# merge the two dfs
df = pd.merge(df2, df1, left_on='d', right_on='b', how='left')
df2['f'] = None
df2['f'] = df.apply(lambda x: x['c'] if x['a_x'] <= x['d'] <= x['b_x'] else None, axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.