2

I've got this table (PostgreSQL 9.3):

x | y  
- | -  
1 | 1  
2 | 2  
2 | 3  
1 | 4

Now I try to get three partitions out of it: Every time the value x is changing (by ordering y), a new dense_rank value should be given. Now I tried the following:

SELECT x, y, dense_rank() over (partition by x order by y) 
FROM table

But with the partition over x the ordering is not working the way I expect. The result is

x  y  dense_rank  
-  -  ----------  
1  1  1     
1  4  2     
2  2  1     
2  3  2 

instead of the expected:

x  y  dense_rank  
-  -  ----------  
1  1  1     
2  2  2     
2  3  2     
1  4  3 

Now I am not sure why the window is not ordered by y.

In the second step I need this rank for a grouping (GROUP BY dense_rank, x). So in the end I need the following result:

x  y      dense_rank  
-  -      ----------  
1  1      1     
2  {2,3}  2         
1  4      3 

Maybe this could be achieved in an easier way?

3
  • When you partition by x and order by y within window function Postgres has to order it by x,y so this is why your output is ordered like it is (x=1, y=4 comes before x=2, y=2) Commented Feb 9, 2016 at 16:50
  • @ConsiderMe thanks. That's the point i missed. Commented Feb 9, 2016 at 18:50
  • You may be interested in this answer to, "Solving “Gaps and Islands” with row_number() and dense_rank()? Commented Mar 14, 2017 at 19:00

1 Answer 1

4

partition over x the ordering is not working the way I expect

It is working perfectly fine. When you partition by x first 1 and last 1 are in the same group.

Window Functions:

The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.

To get result you want you could use (classic example of gaps and islands problem):

SELECT *, ROW_NUMBER() OVER (ORDER BY y) -
          ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) + 1 AS group_id
FROM tab
ORDER BY group_id

LiveDemo

Output:

╔═══╦═══╦══════════╗
║ x ║ y ║ group_id ║
╠═══╬═══╬══════════╣
║ 1 ║ 1 ║        1 ║
║ 2 ║ 2 ║        2 ║
║ 2 ║ 3 ║        2 ║
║ 1 ║ 4 ║        3 ║
╚═══╩═══╩══════════╝

Warning:
This solution is not general.

EDIT:

More general solution is to utilize LAG to get previous value and windowed SUM:

WITH cte AS
(
  SELECT t1.x, t1.y, LAG(x) OVER(ORDER BY y) AS x_prev
  FROM tab t1
)
SELECT x,y, SUM( CASE WHEN x = COALESCE(x_prev,x) THEN 0 ELSE 1 END) 
            OVER(ORDER BY y) + 1 AS group_id
FROM cte
ORDER BY group_id;

LiveDemo2

Sign up to request clarification or add additional context in comments.

3 Comments

Wow. Ok, it is not the way I expected. Is there no way for rank() instead of two row_number() calls. I understood the rank-function that it counts the different partitions. But I found my error: I thought there is first an ordering an then the partitioning. But it is only the ordering within the partition of course. That makes sense now. So thank you for understanding. Is there a way to make partitions after the ordering? So if I order by y first and then I can do the partitions by x...
Hi, i thought about your solution for a while. I was wondering why it works. It does for my special case because I will group the result on x and group_id. But it is not a general solution for the partitioning problem in my option. If I expand my example, the group_id is not changing if x is changing: link. The 5th data set contains: x = 2. So it differs from the 4th. In my case it has to be a new group_id. But it has the same as the 4th line (order by y). Is there a general solution?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.