5

I'm trying to subset a pandas dataframe based on columns in another, similar dataframe. I can do this easily in R:

df1 <- data.frame(A=1:5, B=6:10, C=11:15)
df2 <- data.frame(A=1:5, B=6:10)

#Select columns in df1 that exist in df2
df1[df1 %in% df2]
  A  B
1 1  6
2 2  7
3 3  8
4 4  9
5 5 10

#Select columns in df1 that do not exist in df2
df1[!(df1 %in% df2)]
   C
1 11
2 12
3 13
4 14
5 15

How can I do that with the pandas dataframes below?

df1 = pd.DataFrame({'A': [1,2,3,4,5],'B': [6,7,8,9,10],'C': [11,12,13,14,15]})
df2 = pd.DataFrame({'A': [1,2,3,4,5],'B': [6,7,8,9,10],})

2 Answers 2

8
In [77]: df1[df1.columns.intersection(df2.columns)]
Out[77]:
   A   B
0  1   6
1  2   7
2  3   8
3  4   9
4  5  10

In [78]: df1[df1.columns.difference(df2.columns)]
Out[78]:
    C
0  11
1  12
2  13
3  14
4  15

or similar, but not obvious:

In [92]: df1[list(set(df1) & set(df2))]
Out[92]:
    B  A
0   6  1
1   7  2
2   8  3
3   9  4
4  10  5

In [93]: df1[list(set(df1) - set(df2))]
Out[93]:
    C
0  11
1  12
2  13
3  14
4  15
Sign up to request clarification or add additional context in comments.

Comments

2

Use isin, dropna:

df1[df1.isin(df2)].dropna(1)

   A   B
0  1   6
1  2   7
2  3   8
3  4   9
4  5  10


df1[~df1.isin(df2)].dropna(1)

    C
0  11
1  12
2  13
3  14
4  15

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.