Need to check if a data frame is subset of another data frame [duplicate]

Question

I have 2 csv files (csv1, csv2). In csv2 there might be new column or row added in csv2. I need to verify if csv1 is subset of csv2. For being a subset whole row should be present in both the files and elements from new coulmn or row should be ignored.

csv1:

c1,c2,c3
A,A,6
D,A,A
A,1,A

csv2:

c1,c2,c3,c4
A,A,6,L
A,changed,A,L
D,A,A,L
Z,1,A,L
Added,Anew,line,L

I am trying is :

df1 = pd.read_csv(csv1_file)
df2 = pd.read_csv(csv2_file)
matching_cols=df1.columns.intersection(df2.columns).tolist()

sorted_df1 = df1.sort_values(by=list(matching_cols)).reset_index(drop=True)
sorted_df2 = df2.sort_values(by=list(matching_cols)).reset_index(drop=True)


print("truth data>>>\n",sorted_df1)
print("Test data>>>\n",sorted_df2)


df1_mask = sorted_df1[matching_cols].eq(sorted_df2[matching_cols])
# print(df1_mask)
print("compared data>>>\n",sorted_df1[df1_mask])

It gives the out put as :

truth data>>>
   c1   c2   c3
0  A  1   A
1  A    A  6
2  D    A    A

Test data>>>
       c1       c2    c3   c4
0      A        A   6   L
1      A  changed     A  L
2  Added     Anew  line L
3      D        A     A   L
4      Z      1     A   L

compared data>>>
     c1   c2   c3
0    A  NaN  NaN
1    A  NaN  NaN
2  NaN  NaN  NaN

What i want is :

compared data>>>
     c1   c2   c3
0    Nan  NaN  NaN
1    A    A    6
2  D    A    A

Please help.

Thanks

csv1 and csv2 both has A,A,6 in the first row, why should it return Nan, can you check — anky
– anky, Commented Jun 25, 2019 at 7:18

jezrael · Accepted Answer · 2019-06-25 07:22:46Z

1

If need missing values in last row, because no match, use DataFrame.merge with left join and indicator parameter, then set mising values by mask and rmove helper column _merge:

matching_cols=df1.columns.intersection(df2.columns)

df2 = df1[matching_cols].merge(df2[matching_cols], how='left', indicator=True)
df2.loc[df2['_merge'].ne('both')] = np.nan
df2 = df2.drop('_merge', axis=1)
print (df2)

    c1   c2   c3
0    A    A    6
1    D    A    A
2  NaN  NaN  NaN

answered Jun 25, 2019 at 7:22

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Need to check if a data frame is subset of another data frame [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related