1

So I have two dataframes consisting of 6 columns each containing numbers. I need to compare 1 column from each dataframe to make sure they match and fix any values in that column that don't match. Columns are already sorted and they match in terms of length. So far I can find the differences in the columns:

df1.loc[(df1['col1'] != df2['col2'])]

then I get the index # where df1 doesn't match df2. Then I'll go to that same index # in df2 to find out what value in col2 is causing a mismatch then use this to change the value to the correct one found in df2:

df1.loc[index_number, 'col1'] = new_value

Is there a way I can automatically fix the mismatches without having to manually look up what the correct value should be in df2?

2
  • You should be able to do df1[df1 != df2] = new_value or similar Commented Dec 21, 2016 at 15:14
  • I'm sure there is a way to do what you need. The problem is that explaining what that is. I don't know if you want the first column of df1 and second column of df2. Is df2 always the source of new value? You can fix the confusion by editing your post with a hand built example of how it should work. Commented Dec 21, 2016 at 15:32

2 Answers 2

1

if df2 is the authoritative source, you don't need to check where df1 is equal

df1.loc[:, 'column_name'] = df2['column_name']

But if we must check

c = 'column_name'
df1.loc[df1[c] != df2[c], c] = df2[c]
Sign up to request clarification or add additional context in comments.

Comments

1

I think you need compare by eq and then if need add value where dont match use combine_first:

df1 = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,6,5],
                   'E':[5,3,6],
                   'F':[1,4,3]})

print (df1)
   A  B  C  D  E  F
0  1  4  7  1  5  1
1  2  5  8  6  3  4
2  3  6  9  5  6  3

df2 = pd.DataFrame({'A':[1,2,1],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df2)
   A  B  C  D  E  F
0  1  4  7  1  5  7
1  2  5  8  3  3  4
2  1  6  9  5  6  3

If need compare one column with all DataFrame:

print (df1.eq(df2.A, axis=0))
       A      B      C      D      E      F
0   True  False  False   True  False   True
1   True  False  False  False  False  False
2  False  False  False  False  False  False

print (df1.eq(df1.A, axis=0))
      A      B      C      D      E      F
0  True  False  False   True  False   True
1  True  False  False  False  False  False
2  True  False  False  False  False   True

And if need same column D:

df1.D = df1.loc[df1.D.eq(df2.D), 'D'].combine_first(df2.D)
print (df1)

   A  B  C    D  E  F
0  1  4  7  1.0  5  1
1  2  5  8  3.0  3  4
2  3  6  9  5.0  6  3

But then is easier only assign column D from df2 to D of df1:

df1.D = df2.D
print (df1)
   A  B  C  D  E  F
0  1  4  7  1  5  1
1  2  5  8  3  3  4
2  3  6  9  5  6  3

If indexes are different, is possible use values for convert column to numpy array:

df1.D = df1.D.values
print (df1)
   A  B  C  D  E  F
0  1  4  7  1  5  1
1  2  5  8  6  3  4
2  3  6  9  5  6  3

5 Comments

Rather than comparing every column in both dfs, I just need to compare 1 column from each df-- it doesn't look like I can apply the .eq method on specific columns.
So if I need to compare 1 column from each i can use axis=0 on df1 as well?
Yes, you can. I am still not sure what need exactly. Can you add desired output from my dataframes df1 and df2 ?
My apologies for not being more clear. Using your example above, I need to match ONLY column 'D' from df1 and df2. In the end, I only need df1. Does that clarify?
My apologies for not being more clear. Using your example above, I need to match ONLY column 'D' from df1 and df2. Whatever is in df2 column D that is causing a mis-match in column D of df1 needs to be changed in df1. In the end, I only need df1. Does that clarify?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.