0

I have below dataframe

              CVE ID             Product Versions
0      CVE-2022-46689                   Mac OS 12
1      CVE-2022-42856                      Safari
2      CVE-2022-46689             Windows 10 21h1
3      CVE-2022-41121             Windows 10 21h2
4      CVE-2022-42856                      Safari

I would like to remove duplicates based on the column CVE ID but also want to make sure that I store the value present in the 2nd column Product Versions (but remove the value if already present)

Something like this below:

              CVE ID             Product Versions
0      CVE-2022-46689            Mac OS 12, Windows 10 21h1
1      CVE-2022-42856            Safari
2      CVE-2022-41121            Windows 10 21h2

How should I do it?

Any help is appreciated

1

2 Answers 2

1

here is one way to do it

# drop duplicates (in memory)
# groupby CVE ID and join the resulting list of product version

out=(df.drop_duplicates(subset=['CVE ID','Product Versions'])
 .groupby(['CVE ID'],as_index=False)['Product Versions']
 .agg(','.join ))

out
            CVE ID  Product Versions
0   CVE-2022-41121  Windows 10 21h2
1   CVE-2022-42856  Safari
2   CVE-2022-46689  Mac OS 12, Windows 10 21h1
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your help! this is the easiest and quickest solution!!!
1

This could work:

import pandas as pd

# Create the DataFrame
df = pd.DataFrame({'CVE ID': ['CVE-2022-46689', 'CVE-2022-42856', 'CVE-2022-46689', 'CVE-2022-41121', 'CVE-2022-42856'],
                'Product Versions': ['Mac OS 12', 'Safari', 'Windows 10 21h1', 'Windows 10 21h2', 'Safari']})

# Group the rows by the 'CVE ID' column
grouped = df.groupby('CVE ID')

# Initialize an empty list to store the results
result = []

# Iterate over the groups
for name, group in grouped:
    # Concatenate the values in the 'Product Versions' column
    product_versions = ', '.join(list(set(group['Product Versions'])))
    
    # Append the name and product_versions to the result list
    result.append({'CVE ID': name, 'Product Versions': product_versions})

# Convert the result list to a DataFrame
result = pd.DataFrame(result)

print(result)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.