I have a problem that I can't assign multiple columns using '.loc'.
I would like to do it using one line.
Example
DataFrame Input:
NAME AGE NEW_AGE COUNTRY NEW_COUNTRY _merge
0 LUCAS 80.0 NaN BRAZIL NaN left_only
1 STEVE NaN 35.0 NaN USA both
2 BEN NaN 25.0 CANADA both
DataFrame Output:
NAME AGE NEW_AGE COUNTRY NEW_COUNTRY _merge
0 LUCAS 80.0 NaN BRAZIL NaN left_only
1 STEVE 35.0 35.0 USA USA both
2 BEN 25.0 25.0 CANADA CANADA both
Code
import pandas as pd
people = pd.DataFrame(
{'NAME': ['LUCAS', 'STEVE', 'BEN'],
'AGE': [80, pd.np.nan, pd.np.nan],
'NEW_AGE': [pd.np.nan, 35, 25],
'COUNTRY': ['BRAZIL', pd.np.nan, ''],
'NEW_COUNTRY': [pd.np.nan, 'USA', 'CANADA'],
'_merge': ['left_only', 'both', 'both']
})
people.loc[people['_merge'] == 'both', 'AGE'] = people['NEW_AGE']
people.loc[people['_merge'] == 'both', 'COUNTRY'] = people['NEW_COUNTRY']
I tried this way but it fails.
# USING ONLY ONE DOESNT WORK
people.loc[people['_merge'] == 'both', ['AGE', 'COUNTRY']] = \
people[['NEW_AGE', 'NEW_COUNTRY']]
# USING TO_NUMPY CAUSE OF http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
people.loc[people['_merge'] == 'both', ['AGE', 'COUNTRY']] = \
people[['NEW_AGE', 'NEW_COUNTRY']].to_numpy()
Does anyone know how to assign multiple columns using one command?
Pandas: 0.24.1
Thanks.