1

I want to update data frame X on values from dataframe from Y.

X = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                  'B': ['B0', 'B1', 'B2'], 
                  'C': ['C0', 'C1', 'C2'], 
                  'D': ['D0', 'D1', 'D2']})

    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2

Y = pd.DataFrame({'A': ['A0', 'A1'],
                  'B': ['B0', 'B1'], 
                  'C': ['C0xx', 'C1xx'], 
                  'D': ['D0xx', 'D1xx']})

    A   B     C     D
0  A0  B0  C0xx  D0xx
1  A1  B1  C1xx  D1xx

And the result to be:

    A   B   C   D
0  A0  B0  C0xx  D0xx
1  A1  B1  C1xx  D1xx
2  A2  B2  C2    D2

Of course my dataframe is match bigger.

1
  • What is output if Y = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1'], 'C': ['C0xx', 'C1xx'], 'D': ['D0xx', 'D1xx']}, index=[2,1]) ? Commented Jul 25, 2017 at 9:59

2 Answers 2

3

1. Both DataFrames have the same index

This is the case you presented in the example given in your question. You might want to use the update method:

>>> X.update(Y)
>>> X

    A   B     C     D
0  A0  B0  C0xx  D0xx
1  A1  B1  C1xx  D1xx
2  A2  B2    C2    D2

It also works if lines are in a different order in X and Y:

>>> Y = pd.DataFrame({'A': ['A1', 'A0'], 
                      'B': ['B1', 'B0'], 
                      'C': ['C1xx', 'C0xx'], 
                      'D': ['D1xx', 'D0xx']}, 
                     index=[1,0])
>>> Y
    A   B     C     D
1  A1  B1  C1xx  D1xx
0  A0  B0  C0xx  D0xx

>>> X.update(Y)
>>> X
    A   B     C     D
0  A0  B0  C0xx  D0xx
1  A1  B1  C1xx  D1xx
2  A2  B2    C2    D2

2. Different indexes

If Y has a different index:

>>> Y = pd.DataFrame({'A': ['A0', 'A1'], 
                      'B': ['B0', 'B1'], 
                      'C': ['C0xx', 'C1xx'], 
                      'D': ['D0xx', 'D1xx']}, 
                     index=[2,1])
>>> Y

    A   B     C     D
2  A0  B0  C0xx  D0xx
1  A1  B1  C1xx  D1xx

You can still use update if you can find another column usable as an index (identifying the lines so that they match the lines to be replaced). I take the example of the "A" column but a multiple index would work as well.

>>> X2, Y2 = X.set_index("A"), Y.set_index("A")
>>> X2.update(Y2)
>>> X2.reset_index(inplace=True)
>>> X2
    A   B     C     D
0  A0  B0  C0xx  D0xx
1  A1  B1  C1xx  D1xx
2  A2  B2    C2    D2
Sign up to request clarification or add additional context in comments.

3 Comments

Very neat answer!
It works only because first rows in both dataframes have same index, what in real data is not possible. Check my edit.
@jezrael Thank you for your remark, I added your example in my answer. If possible, using a different index works fine.
1

I think you need combine_first with set_index if need add missing values by A, B columns in both df:

print (Y.set_index(['A','B']).combine_first(X.set_index(['A','B'])).reset_index())

    A   B     C     D
0  A0  B0  C0xx  D0xx
1  A1  B1  C1xx  D1xx
2  A2  B2    C2    D2

Unfortunately update works bad:

Y = pd.DataFrame({'A': ['A0', 'A1'],
                  'B': ['B0', 'B1'], 
                  'C': ['C0xx', 'C1xx'], 
                  'D': ['D0xx', 'D1xx']}, index=[2,1])
print (X)
    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2

print (Y)
    A   B     C     D
2  A0  B0  C0xx  D0xx
1  A1  B1  C1xx  D1xx

X.update(Y)
print (X)
    A   B     C     D
0  A0  B0    C0    D0
1  A1  B1  C1xx  D1xx
2  A0  B0  C0xx  D0xx

X.set_index(['A','B']).update(Y.set_index(['A','B']))
print (X)
    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2

print (Y.set_index(['A','B']).combine_first(X.set_index(['A','B'])).reset_index())
    A   B     C     D
0  A0  B0  C0xx  D0xx
1  A1  B1  C1xx  D1xx
2  A2  B2    C2    D2

1 Comment

Glad can help! Good luck and nice day!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.