6

I have a query along the lines of

select b.* from
(select key, max(val) as val from (somequery) group by key) as a
inner join
(somequery) as b
on a.key = b.key and a.val = b.val
order by key

And I was wondering if there is an obvious way (that I am missing) to simplify it (given that somequery might be rather long).

Any thoughts would be appreciated.

2
  • somequery? You'll need to elaborate on that if you expect to get any real help. Commented Jul 31, 2009 at 4:36
  • 1
    @rexem: No the OP doesn't. It's obvious that he wants to pull back only rows with the max val. Commented Jul 31, 2009 at 4:37

3 Answers 3

2

There is, but it's certainly not obvious:

select
    *
from
    (
    select
        key,
        val,
        col,
        max(val) over (partition by key) as MaxVal
    from
        tableA
    )
where
    val = MaxVal

Using the over clause is a great way to do this, and doesn't require any extraneous subqueries. All it does is take the max val per each key, and then wraps that result set in a subquery, where we can check the val against MaxVal to make sure we're pulling the correct row.

Much cleaner and faster than doing up to three subqueries!

Sign up to request clarification or add additional context in comments.

2 Comments

I've not seen MAX..OVER before. How does this work for composite keys? I'm trying on my own table and I can use the agg/join or ROWNUMBER = 1, but not this technique.
max(val) over (partition by key1, key2, key3) works like a charm.
0

You want to use ROW_NUMBER() or RANK() for this.

(and please make sure that a previous query ends with a semi-colon)

with ranked as
(
select *, row_number() over (partition by key order by val desc) as bestrow
from sometableorquery 
)
select *
from ranked
where bestrow = 1
order by key;

If you want ties (so that a key that has two best values returns both), then use rank() instead of row_number().

Rob

3 Comments

Also - if you want to get the top 3 of each key, then try "WHERE bestrow <= 3"
ROW_NUMBER will only return one row per key. Where as the OP's code would ties in the case of there being more than one row for a key with val = max(val). Replacing ROW_NUMBER() with RANK() would preserve the original intention.
Thanks Shannon. I mentioned the 'if you want ties' thing in there.
-1

I would select your subqueries into a temporary tables before selecting from them. I think you'd see a significant performance boost from that.

2 Comments

Not my downvote, but...if you select into a temporary table, the DBMS has to store that data, and its metadata, and then use it once. It would be more effecient and effective to give the optimizer the whole query - so it can avoid materializing the sub-query if at all possible. Now, if there are several queries that will use the same sub-query, then you might see a benefit from an explicit temp table - though it would be a good idea to measure and make sure.
@Jonathan: I agree that in principle what you're saying is correct but my experience with SQL server suggests that for larger tables (say anything larger than 2000 rows) this route is far more efficient.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.