Assign values to new column based on conditions between two pandas DataFrames

Question

Lets say there are two data frames: df1 contains 4 columns. The 'NAME' column contains the name of the cities (A, B, C). Each of the other columns represents a year (y0, y1, y2) containing the number of people living on that city.

np.random.seed(seed=34)
name = ['A','B','C']
y0 = np.random.random_integers(1,high=40, size=3)
y1 = np.random.random_integers(1,high=40, size=3)
y2 = np.random.random_integers(1,high=40, size=3)
df = pd.DataFrame(data={'NAME' : name, 'y0' : y0, 'y1' : y1, 'y2' : y2})
df

   NAME y0  y1  y2
0   A   34  36  15
1   B   22  6   30
2   C   5   12  19

df2 contains 3 columns. The column 'NAME' contains the name of the cities. The 'y' column contains the value of the year (y0, y1, y2) and the 'i' column contains the number of people who have internet access.

y = ['y0', 'y1', 'y2',  'y0', 'y1', 'y2',  'y0', 'y1', 'y2']
name2 = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C']
i = [15, 6, 12, 18, 4, 20, 3, 8, 2]
df2 = pd.DataFrame(data={'NAME':name2, 'y':y, 'i':i})
df2

   NAME y   i
0   A   y0  15
1   A   y1  6
2   A   y2  23
3   B   y0  18
4   B   y1  17
5   B   y2  20
6   C   y0  3
7   C   y1  24
8   C   y2  2

I need to create a column on df2 that contains the values of df1 with the condition that the value of df2['NAME'] is equal to df1['NAME'] and df2['y'] is equal to the df1 column , in order to get the following result:


   NAME y   i   v
0   A   y0  15  34
1   A   y1  6   36
2   A   y2  12  15
3   B   y0  18  22
4   B   y1  4   6
5   B   y2  20  30
6   C   y0  3   5
7   C   y1  8   12
8   C   y2  2   19

The number of times that the cities names appear on df is not constant. Thank you in advance.

it is supposed to be equal to one of the "year" (y0, y1, y2) columns in df1. — Diogo Pessanha
– Diogo Pessanha, Commented Dec 16, 2019 at 20:15

oppressionslayer · Accepted Answer · 2019-12-16 20:35:33Z

2

You can do this, since the values match:

df2['v'] = df.melt(col_level=0, id_vars='NAME').sort_values(by='NAME').reset_index(drop=True)['value']

output:

  NAME   y   i   v
0    A  y0  15  34
1    A  y1   6  36
2    A  y2  12  15
3    B  y0  18  22
4    B  y1   4   6
5    B  y2  20  30
6    C  y0   3   5
7    C  y1   8  12
8    C  y2   2  19

or with combinefirst

df3 = df.melt(col_level=0, id_vars='NAME').sort_values(by='NAME').reset_index(drop=True)  
df3 = df3.rename(columns={'variable':'y'})                                                                                         
df3 = df2.combine_first(df3)
df3['value'] = df3['value'].astype(int)


  NAME   i  value   y
0    A  15     34  y0
1    A   6     36  y1
2    A  12     15  y2
3    B  18     22  y0
4    B   4      6  y1
5    B  20     30  y2
6    C   3      5  y0
7    C   8     12  y1
8    C   2     19  y2

edited Dec 16, 2019 at 20:35

answered Dec 16, 2019 at 20:21

oppressionslayer

7,2242 gold badges11 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Umar.H · Accepted Answer · 2019-12-16 20:43:49Z

0

a merge would be better imo,

df = pd.melt(df,id_vars='NAME',var_name='y',value_name='v')

df_new = pd.merge(df,df2,on=['NAME','y'].sort_values('NAME')
print(df_new)
  NAME   y   i   v
0    A  y0  15  34
3    A  y1   6  36
6    A  y2  12  15
1    B  y0  18  22
4    B  y1   4   6
7    B  y2  20  30
2    C  y0   3   5
5    C  y1   8  12
8    C  y2   2  19

edited Dec 16, 2019 at 20:43

answered Dec 16, 2019 at 20:36

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Collectives™ on Stack Overflow

Assign values to new column based on conditions between two pandas DataFrames

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related