Selecting non-repeating values in Postgres

Question

SELECT DISTINCT a.s_id, select2Result.s_id, select2Result."mNrPhone", 
       select2Result."dNrPhone" 
FROM "Table1" AS a INNER JOIN
    (
    SELECT b.s_id, c."mNrPhone", c."dNrPhone" FROM "Table2" AS b, "Table3" AS c  
    WHERE b.a_id = 1001 AND b.s_id = c.s_id
    ORDER BY b.last_name) AS select2Result
ON a.a_id = select2Result.student_id
WHERE a.k_id = 11211

It returns:

1001;1001;"";""
1002;1002;"";""
1002;1002;"2342342232123";"2342342"
1003;1003;"";""
1004;1004;"";""

1002 value is repeated twice, but it shouldn't because I used DISTINCT and no other table has an id repeated twice.

Those two rows (for a.s_id = 1002) are distinct. One of them has two values, the other has four. If '1002' weren't repeated, how would the dbms know which of those two rows to return? It might help for you to edit your question, and describe your actual problem. (What you describe above isn't your actual problem; it's a solution that doesn't quite work.) — Mike Sherrill 'Cat Recall'
– Mike Sherrill 'Cat Recall', Commented Aug 10, 2012 at 11:27

score 4 · Accepted Answer · 2012-08-10 12:42:43Z

4

You can use DISTINCT ON like this:

   SELECT DISTINCT ON (a.s_id) 
          a.s_id, select2Result.s_id, select2Result."mNrPhone", 
          select2Result."dNrPhone"
   ...

But like other persons have told you, the "repeated records" are different really.

edited Aug 10, 2012 at 12:42

answered Aug 10, 2012 at 12:35

user731136

Sign up to request clarification or add additional context in comments.

Comments

Jonathan Leffler · Accepted Answer · 2012-08-10 12:13:02Z

The qualifier DISTINCT applies to the entire row, not to the first column in the select-list. Since columns 3 and 4 (mNrPhone and dNrPhone) are different for the two rows with s_id = 1002, the DBMS correctly lists both rows. You have to write your query differently if you only want the s_id = 1002 to appear once, and you have to decide which auxilliary data you want shown.

As an aside, it is strongly recommended that you always use the explicit JOIN notation (which was introduced in SQL-92) in all queries and sub-queries. Do not use the old implicit join notation (which is all that was available in SQL-86 or SQL-89), and especially do not use a mixture of explicit and implicit join notations (where your sub-query uses the implicit join, but the main query uses explicit join). You need to know the old notation so you can understand old queries. You should write new queries in the new notation.

Community · Accepted Answer · 2020-06-20 09:12:55Z

First of all, the query displayed does not work at all, student_id is missing in the sub-query. You use it in the JOIN later.

More interestingly:

Pick a certain row out of a set with `DISTINCT`

DISTINCT and DISTINCT ON return distinct values by sorting all rows according to the set of columns to be distinct, then it picks the first row from every set. It sorts by all rows for a general DISTINCT and only the specified rows for DISTINCT ON. Here lies the opportunity to pick certain rows out of a set over other.

For instance if you prefer rows with not-empty "mNrPhone" in your example:

SELECT DISTINCT ON (a.s_id)   -- sure you didn't want a.a_id?
      ,a.s_id AS a_s_id  -- use aliases to avoid dupe name
      ,s.s_id AS s_s_id
      ,s."mNrPhone"
      ,s."dNrPhone" 
FROM  "Table1" a
JOIN  (
    SELECT b.s_id, c."mNrPhone", c."dNrPhone", ??.student_id  -- misssing!
    FROM  "Table2" b
    JOIN  "Table3" c USING (s_id)
    WHERE  b.a_id = 1001 
    --    ORDER  BY b.last_name  -- pointless, DISTINCT will re-order
     ) s ON a.a_id = s.student_id
WHERE  a.k_id = 11211
ORDER  BY a.s_id -- first col must agree with DISTINCT ON, could add DESC though
         ,("mNrPhone" <> '') DESC -- non-empty first

ORDER BY cannot disagree with DISTINCT on the same query level. To get around this you can either use GROUP BY instead or put the whole query in a sub-query and run another SELECT with ORDER BY on it.

The ORDER BY you had in the sub-query is voided now.

In this particular case, if - as it seems - the dupes come only from the sub-query (you'd have to verify), you could instead:

SELECT a.a_id, s.s_id, s."mNrPhone", s."dNrPhone" -- picking a.a_id over s_id
FROM  "Table1" a
JOIN  (
    SELECT DISTINCT ON (b.s_id)
          ,b.s_id, c."mNrPhone", c."dNrPhone", ??.student_id  -- misssing!
    FROM  "Table2" b
    JOIN  "Table3" c USING (s_id)
    WHERE  b.a_id = 1001 
    ORDER  BY b.s_id, (c."mNrPhone" <> '') DESC -- pick non-empty first
     ) s ON a.a_id = s.student_id
WHERE  a.k_id = 11211
ORDER  BY a.a_id  -- now you can ORDER BY freely

Collectives™ on Stack Overflow

Selecting non-repeating values in Postgres

3 Answers 3

Comments

Comments

Pick a certain row out of a set with `DISTINCT`

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Pick a certain row out of a set with DISTINCT

Comments

Your Answer

Sign up or log in

Post as a guest

Related

Pick a certain row out of a set with `DISTINCT`