0

eg.

INPUT: one dataframe

   Name     id     Price
   Apple     01       13.86
   Cherry    02       13.24
   Banana    02       1.99
   Peach     03       14.76
   Orange    04       2.48

OUTPUT: two dataframes

one with with duplicate dataframe[id]:

   Name     id     Price
   Cherry    02       13.24
   Banana    02       1.99

other without duplicate dataframe[id]:

   Name     id     Price
   Apple     01       13.86
   Peach     03       14.76
   Orange    04       2.48

Many thanks

3 Answers 3

1

INPUT: df; OUTPUT: df_duplicated, df_unique

df_duplicated = df[df['id'].duplicated(keep=False)]
df_unique = pd.concat([df, df_duplicated]).drop_duplicates(keep=False)

print(df_duplicated)
print(df_unique)
Sign up to request clarification or add additional context in comments.

Comments

1
noDuplicate = data.drop_duplicates('id', keep=False)
print("No Duplicates:", noDuplicate)

duplicate = data[data['id'].duplicated(keep=False)]
print("Duplicates:", duplicate)

Comments

0

You can count the occurrence of each unique identifier and then merge the result on your dataframe to get the unique and duplicate values.

As an example:

df = pd.DataFrame(data={'Id': [1, 2, 2, 3, 4]})
agg_df = df.groupby(by='Id').agg(count=('Id', 'count'))
agg_df.reset_index(inplace=True)
filtered_df = agg_df.loc[agg_df['count'] == 1].merge(df, on=['Id'])
unique_df = agg_df.loc[agg_df['count'] > 1].merge(df, on=['Id'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.