Pandas: delete duplicate rows

Question

I have the following df:

url='https://raw.githubusercontent.com/108michael/ms_thesis/master/crsp.dime.mpl.abbridged'

zz=pd.read_csv(url)
zz.head(30)

    date    feccandid   feccandcfscore.dyn  pacid   paccfscore  cid     catcode     type_x  di  amtsum  state   log_diff_unemployment   party   type_y  bills   years_exp   disposition     billsum
0   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
1   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
2   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
3   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
4   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
5   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
6   2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
7   2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
8   2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
9   2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
10  2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
11  2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
12  2006    S8NV00073   0.496   C00113803   0.269   N00006619   H1130   24K     D   2500    NV  -0.024693   Republican  rep     s22-109     12  support     2
13  2006    S8NV00073   0.496   C00113803   0.269   N00006619   H1130   24K     D   2500    NV  -0.024693   Republican  rep     s22-109     12  support     2
14  2006    S8NV00073   0.496   C00249342   0.421   N00006619   H1130   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     2
15  2006    S8NV00073   0.496   C00249342   0.421   N00006619   H1130   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     2

Some of the rows are complete duplicates of each other. Is there a way to delete duplicate rows?

jezrael · Accepted Answer · 2016-05-03 18:40:32Z

2

I think you can use drop_duplicates:

print zz.drop_duplicates()

answered May 3, 2016 at 18:40

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Collective Action Over a year ago

That was embarrassing! For some reason I thought that drop_duplicates would only work on unique columns.

Collectives™ on Stack Overflow

Pandas: delete duplicate rows

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related