3

Lets say there are two data frames: df1 contains 4 columns. The 'NAME' column contains the name of the cities (A, B, C). Each of the other columns represents a year (y0, y1, y2) containing the number of people living on that city.

np.random.seed(seed=34)
name = ['A','B','C']
y0 = np.random.random_integers(1,high=40, size=3)
y1 = np.random.random_integers(1,high=40, size=3)
y2 = np.random.random_integers(1,high=40, size=3)
df = pd.DataFrame(data={'NAME' : name, 'y0' : y0, 'y1' : y1, 'y2' : y2})
df

   NAME y0  y1  y2
0   A   34  36  15
1   B   22  6   30
2   C   5   12  19

df2 contains 3 columns. The column 'NAME' contains the name of the cities. The 'y' column contains the value of the year (y0, y1, y2) and the 'i' column contains the number of people who have internet access.

y = ['y0', 'y1', 'y2',  'y0', 'y1', 'y2',  'y0', 'y1', 'y2']
name2 = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C']
i = [15, 6, 12, 18, 4, 20, 3, 8, 2]
df2 = pd.DataFrame(data={'NAME':name2, 'y':y, 'i':i})
df2

   NAME y   i
0   A   y0  15
1   A   y1  6
2   A   y2  23
3   B   y0  18
4   B   y1  17
5   B   y2  20
6   C   y0  3
7   C   y1  24
8   C   y2  2

I need to create a column on df2 that contains the values ​​of df1 with the condition that the value of df2['NAME'] is equal to df1['NAME'] and df2['y'] is equal to the df1 column , in order to get the following result:


   NAME y   i   v
0   A   y0  15  34
1   A   y1  6   36
2   A   y2  12  15
3   B   y0  18  22
4   B   y1  4   6
5   B   y2  20  30
6   C   y0  3   5
7   C   y1  8   12
8   C   y2  2   19

The number of times that the cities names appear on df is not constant. Thank you in advance.

2
  • what column is df2['y'] suppose to equal? Commented Dec 16, 2019 at 20:10
  • 1
    it is supposed to be equal to one of the "year" (y0, y1, y2) columns in df1. Commented Dec 16, 2019 at 20:15

2 Answers 2

2

You can do this, since the values match:

df2['v'] = df.melt(col_level=0, id_vars='NAME').sort_values(by='NAME').reset_index(drop=True)['value'] 

output:

  NAME   y   i   v
0    A  y0  15  34
1    A  y1   6  36
2    A  y2  12  15
3    B  y0  18  22
4    B  y1   4   6
5    B  y2  20  30
6    C  y0   3   5
7    C  y1   8  12
8    C  y2   2  19

or with combinefirst

df3 = df.melt(col_level=0, id_vars='NAME').sort_values(by='NAME').reset_index(drop=True)  
df3 = df3.rename(columns={'variable':'y'})                                                                                         
df3 = df2.combine_first(df3)
df3['value'] = df3['value'].astype(int)


  NAME   i  value   y
0    A  15     34  y0
1    A   6     36  y1
2    A  12     15  y2
3    B  18     22  y0
4    B   4      6  y1
5    B  20     30  y2
6    C   3      5  y0
7    C   8     12  y1
8    C   2     19  y2
Sign up to request clarification or add additional context in comments.

Comments

0

a merge would be better imo,

df = pd.melt(df,id_vars='NAME',var_name='y',value_name='v')

df_new = pd.merge(df,df2,on=['NAME','y'].sort_values('NAME')
print(df_new)
  NAME   y   i   v
0    A  y0  15  34
3    A  y1   6  36
6    A  y2  12  15
1    B  y0  18  22
4    B  y1   4   6
7    B  y2  20  30
2    C  y0   3   5
5    C  y1   8  12
8    C  y2   2  19

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.