I have a large data frame with a many columns. One of these columns is what's supposed to be a Unique ID and the other is a Year. Unfortunately, there are duplicates in the Unique ID column.
I know how to generate a list of all duplicates, but what I actually want to do is extract them out such that only the first entry (by year) remains. For example, the dataframe currently looks something like this (with a bunch of other columns):
ID Year
----------
123 1213
123 1314
123 1516
154 1415
154 1718
233 1314
233 1415
233 1516
And what I want to do is transform this dataframe into:
ID Year
----------
123 1213
154 1415
233 1314
While storing just the those duplicates in another dataframe:
ID Year
----------
123 1314
123 1516
154 1415
233 1415
233 1516
I could drop duplicates by year to keep the oldest entry, but I am not sure how to get just the duplicates into a list that I can store as another dataframe.
How would I do this?