Given DataFrame df:
Id Sex Group Time Time!
0 21 M 2 2.31 NaN
1 2 F 2 2.29 NaN
and update:
Id Sex Group Time
0 21 M 2 2.36
1 2 F 2 2.09
2 3 F 1 1.79
I want to match on Id, Sex and Group and either update Time! with Time value (from the update df) if match, or insert if a new record.
Here is how I do it:
df = df.set_index(['Id', 'Sex', 'Group'])
update = update.set_index(['Id', 'Sex', 'Group'])
for i, row in update.iterrows():
if i in df.index: # update
df.ix[i, 'Time!'] = row['Time']
else: # insert new record
cols = up.columns.values
row = np.array(row).reshape(1, len(row))
_ = pd.DataFrame(row, index=[i], columns=cols)
df = df.append(_)
print df
Time Time!
Id Sex Group
21 M 2 2.31 2.36
2 F 2 2.29 2.09
3 F 1 1.79 NaN
The code seem to work and my wished result matches with the above. However, I have noticed this behaving faultily on a big data set, with the conditional
if i in df.index:
...
else:
...
working obviously wrong (it would proceed to else and vice-verse where it shouldn't, I guess, this MultiIndex may be the cause somehow).
So my question is, do you know any other way, or a more robust version of mine, to update one df based on another df?
(2, F, 1)in the examples you provided