4

let's say we have the following code example where we create two basic dataframes:

import pandas as pd
 
# Creating Dataframes
a = [{'Name': 'abc', 'Age': 8, 'Grade': 3},
     {'Name': 'xyz', 'Age': 9, 'Grade': 3}]
 
df1 = pd.DataFrame(a)
b = [{'ID': 1,'Name': 'abc', 'Age': 8},
     {'ID': 2,'Name': 'xyz', 'Age': 9}]
 
df2 = pd.DataFrame(b)
 
# Printing Dataframes
display(df1)
display(df2)

We get the following datasets:

    Name   Age  Grade
0   abc    8    3
1   xyz    9    3


    ID   Name   Age
0   1    abc    8
1   2    xyz    9

How can I find the list of columns that are not repeated in these frames when they are intersected? That is, as a result, I want to get the names of the following columns: ['Grade', 'ID']

1 Answer 1

6

Use symmetric_difference

res = df2.columns.symmetric_difference(df1.columns)
print(res)

Output

Index(['Grade', 'ID'], dtype='object')

Or as an alternative, use set.symmetric_difference

res = set(df2.columns).symmetric_difference(df1.columns)
print(res)

Output

{'Grade', 'ID'}

A third alternative, suggested by @SashSinha, is to use the shortcut:

res = df2.columns ^ df1.columns

but as of pandas 1.4.3 this issue a warning:

FutureWarning: Index.xor operating as a set operation is deprecated, in the future this will be a logical operation matching Series.xor. Use index.symmetric_difference(other) instead. res = df2.columns ^ df1.columns

Sign up to request clarification or add additional context in comments.

1 Comment

I didn't know about the warning my bad. In normal python, the shortcut helps a lot when you are limited to a line length of 80 characters...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.