Pandas- Removing duplicate rows based on the columns

Question

I want to delete duplicate rows with respect to a column and rearranging the data in the dataframe based on the certain conditions. For instance, I have the following data-frame:

FROM    CONT    ID1    ID2    ID3    ID4    ID5    ID6    ID7
63309    89     101.3  NA     NA     NA     NA     NA     NA
63309    89     NA     102.3  NA     NA     NA     NA     NA
63309    89     NA     NA     NA     104    NA     NA     NA
63309    90     NA     NA     103    105.0  NA     NA     NA
63309    89     NA     NA     NA     NA     NA     107.1  NA
63310    92     NA     105.1  105.3  789.1  104    NA     NA
63310    92     109    NA     NA     NA     NA     NA     NA
63311    94     104    109    890    NA     NA     NA     107
63309    89     NA     NA     NA     NA     109    NA     111

At the end my result has to something like this.

FROM    CONT    ID1    ID2    ID3    ID4    ID5    ID6    ID7
63309    89     101.3  102.3  NA     104.0  109.0  107.1  111.0

63309    90     NA     NA     103.0  105.0  NA     NA     NA

63310    92     109.0  105.1  105.3  789.1  104.0  NA     NA

63311    94     104.0  109.0   890.0  NA     NA    NA    107.0

The data has to be grouped in 'FROM' column based on 'CONT' column as shown above and the rearrangement based on that. I tried using groupby in pandas, but it didn't give me the required output. It erased the data in the columns after 'CONT'.

Alexander · Accepted Answer · 2016-04-03 00:46:20Z

2

>>> df.groupby(['FROM', 'CONT']).sum()
              ID1    ID2    ID3    ID4  ID5    ID6  ID7
FROM  CONT                                             
63309 89    101.3  102.3    NaN  104.0  109  107.1  111
      90      NaN    NaN  103.0  105.0  NaN    NaN  NaN
63310 92    109.0  105.1  105.3  789.1  104    NaN  NaN
63311 94    104.0  109.0  890.0    NaN  NaN    NaN  107

If you don't want the data indexed:

>>> df.groupby(['FROM', 'CONT'], as_index=False).sum()
    FROM  CONT    ID1    ID2    ID3    ID4  ID5    ID6  ID7
0  63309    89  101.3  102.3    NaN  104.0  109  107.1  111
1  63309    90    NaN    NaN  103.0  105.0  NaN    NaN  NaN
2  63310    92  109.0  105.1  105.3  789.1  104    NaN  NaN
3  63311    94  104.0  109.0  890.0    NaN  NaN    NaN  107

answered Apr 3, 2016 at 0:46

Alexander

111k32 gold badges212 silver badges208 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

johndaniel Over a year ago

Is there a pythonic way of adding empty spaces between the two consecutive rows. I could do it by building a new dataFrame and iterating over rows of the original dataFrame and adding one by one. I was wondering if there could be another way.

Alexander Over a year ago

Not really. Pandas is about data, not presentation. There may be some html/css display options available, but I'm not familiar with them.

Collectives™ on Stack Overflow

Pandas- Removing duplicate rows based on the columns

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related