6

I have a dataframe df

ID Price Region
1 23 AUS
1 45 DXB
2 25 GER
2 18 TUN

I want to write a code in python to get the following output

ID Price Region
1 45 DXB
2 25 TUN

I have tried using pandasql to get the output but it is not giving the output I want

The code I have tried is

import pandas as pd
import pandasql as ps

#to read table
df=pd.read_excel("test.xlsx")

ps.sqldf("select ID, max(Price), Region from df order by ID")

If there is any other code in python itself (without using pandasql) that can get the above output, please let me know

2 Answers 2

4

You can use groupby.transform

output_df = df[df['Price'].eq(df.groupby("ID")['Price'].transform("max"))]

Or with ps.sqldf using window function to get the max price and then return rows where Price equals max price:

output_df  = ps.sqldf("""select ID,Price,Region from 
                        (select *, max(Price) over (partition by ID) max_Price from df)
                        where Price = max_Price""")

    ID  Price Region
0   1     45    DXB
1   2     25    GER
Sign up to request clarification or add additional context in comments.

Comments

1

You can do:

df.sort_values('Price').drop_duplicates('ID', keep='last')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.