Remove duplicate values from entire dataframe

Question

I have a Pandas DataFrame as follows;

data = pd.DataFrame({'A':[1,2,3,1,23,3,76,2,45,76],'B':[12,56,22,45,1,3,98,79,77,67]})

To remove duplicate values from the dataframe I have done this;

set(data['A'].unique()).union(set(data['B'].unique()))

which results in;

set([1, 2, 3, 12, 76, 77, 79, 67, 22, 23, 98, 45, 56])

Is there a better way of doing this? Is there a way of achieving this by using drop_duplicates?

Edit:

also, What if I had two more columns 'C' & 'D' but need to drop duplicates only from 'A' & 'B' ?

Jeff · Accepted Answer · 2014-03-21 14:31:14Z

4

If you are intent on collapsing this

In [10]: np.unique(data.values.ravel())
Out[10]: array([ 1,  2,  3, 12, 22, 23, 45, 56, 67, 76, 77, 79, 98])

This will work as well

In [12]: data.unstack().drop_duplicates()
Out[12]: 
A  0     1
   1     2
   2     3
   4    23
   6    76
   8    45
B  0    12
   1    56
   2    22
   6    98
   7    79
   8    77
   9    67
dtype: int64

answered Mar 21, 2014 at 14:31

Jeff

130k21 gold badges223 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

richie Over a year ago

cool! What if I had two more columns 'C' & 'D' but need to drop duplicates only from 'A' & 'B' ?

Jeff Over a year ago

drop_duplicates takes a cols argument (so you can specify a list)

Collectives™ on Stack Overflow

Remove duplicate values from entire dataframe

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related