Run sql query on pandas dataframe

Question

I have a dataframe df

ID	Price	Region
1	23	AUS
1	45	DXB
2	25	GER
2	18	TUN

I want to write a code in python to get the following output

ID	Price	Region
1	45	DXB
2	25	TUN

I have tried using pandasql to get the output but it is not giving the output I want

The code I have tried is

import pandas as pd
import pandasql as ps

#to read table
df=pd.read_excel("test.xlsx")

ps.sqldf("select ID, max(Price), Region from df order by ID")

If there is any other code in python itself (without using pandasql) that can get the above output, please let me know

anky · Accepted Answer · 2021-01-06 17:47:13Z

4

You can use groupby.transform

output_df = df[df['Price'].eq(df.groupby("ID")['Price'].transform("max"))]

Or with ps.sqldf using window function to get the max price and then return rows where Price equals max price:

output_df  = ps.sqldf("""select ID,Price,Region from 
                        (select *, max(Price) over (partition by ID) max_Price from df)
                        where Price = max_Price""")

    ID  Price Region
0   1     45    DXB
1   2     25    GER

answered Jan 6, 2021 at 17:47

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Quang Hoang · Accepted Answer · 2021-01-06 17:40:26Z

1

You can do:

df.sort_values('Price').drop_duplicates('ID', keep='last')

answered Jan 6, 2021 at 17:40

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Collectives™ on Stack Overflow

Run sql query on pandas dataframe

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related