0

Given DataFrame df:

    Id Sex  Group  Time  Time!
0  21   M      2  2.31    NaN
1   2   F      2  2.29    NaN

and update:

    Id Sex  Group  Time
0  21   M      2  2.36
1   2   F      2  2.09
2   3   F      1  1.79

I want to match on Id, Sex and Group and either update Time! with Time value (from the update df) if match, or insert if a new record.

Here is how I do it:

df = df.set_index(['Id', 'Sex', 'Group'])
update = update.set_index(['Id', 'Sex', 'Group'])

for i, row in update.iterrows():
    if i in df.index:  # update
        df.ix[i, 'Time!'] = row['Time']
    else:              # insert new record
        cols = up.columns.values 
        row = np.array(row).reshape(1, len(row))
        _ = pd.DataFrame(row, index=[i], columns=cols)
       df = df.append(_)

print df

              Time  Time!
Id Sex Group             
21 M   2      2.31   2.36
2  F   2      2.29   2.09
3  F   1      1.79    NaN

The code seem to work and my wished result matches with the above. However, I have noticed this behaving faultily on a big data set, with the conditional

if i in df.index:
    ...
else:
    ...

working obviously wrong (it would proceed to else and vice-verse where it shouldn't, I guess, this MultiIndex may be the cause somehow).

So my question is, do you know any other way, or a more robust version of mine, to update one df based on another df?

4
  • I think there is a mistake in your expected, the 2.09 row, doesn't have group=1 in the update. Commented Feb 25, 2014 at 23:28
  • I think your expected output has a typo btw. There's no (2, F, 1) in the examples you provided Commented Feb 25, 2014 at 23:29
  • OK Andy, first you beat my answer and now my comment :) Commented Feb 25, 2014 at 23:29
  • en.wikipedia.org/wiki/The_Typing_of_the_Dead :) Commented Feb 25, 2014 at 23:33

1 Answer 1

4

I think I would do this with a merge, and then update the columns with a where. First remove the Time column from up:

In [11]: times = up.pop('Time')  # up = the update DataFrame

In [12]: df1 = df.merge(up, how='outer')

In [13]: df1
Out[13]:
   Id Sex  Group  Time  Time!
0  21   M      2  2.31    NaN
1   2   F      2  2.29    NaN
2   3   F      1   NaN    NaN

Update Time if it's not NaN and Time! if it's NaN:

In [14]: df1['Time!'] = df1['Time'].where(df1['Time'].isnull(), times)

In [15]: df1['Time'] = df1['Time'].where(df1['Time'].notnull(), times)

In [16]: df1
Out[16]:
   Id Sex  Group  Time  Time!
0  21   M      2  2.31   2.36
1   2   F      2  2.29   2.09
2   3   F      1  1.79    NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.