0

I'm using PostgreSQL 9.3, and I've got this big, ugly query...

SELECT cai.id
FROM common_activityinstance cai
JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
JOIN common_activitysetting cas ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-11'
    AND (   key = 'disable_student_nav' AND value = 'True'
         OR key = 'pacing' AND value = 'student');

...which gives me this result...

    id  
  ------
   1352
   1352
   1353
   1353
   1354
   1355
 (6 rows)

How can I improve my query to get the count of the duplicate rows (2 in this example)?

4
  • from the actual select query ? Commented Sep 12, 2015 at 4:48
  • @wingedpanther: Good suggestion. That gives me the two duplicate IDs, but not the count. The number of rows that have two duplicate IDs could be in the thousands, so I don't want to return all that data from my server and have to count it on the client side. Commented Sep 12, 2015 at 4:55
  • Can an id appear more than twice? Commented Sep 12, 2015 at 11:23
  • From which of the tables do key and `value stem? Commented Sep 12, 2015 at 11:33

2 Answers 2

4

Using Sub-Query

select count(*) total_dups from(
    SELECT count(cai.id)
    FROM common_activityinstance cai
    JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
    JOIN common_activitysetting cas ON cas.id = cais.id
    WHERE cai.end_time::date = '2015-09-11'
        AND (key = 'disable_student_nav'
                AND value = 'True'
                OR key = 'pacing'
                AND value = 'student')
    group by cai.id having count(cai.id) >1
    ) t

group by cai.id having count(cai.id) > 1 can be used to find out duplicates count of each cai.id,Then SELECT count(cai.id)(select ...)t can be used to find out count of all duplicate in the Sub-Query.

OR

Using CTE

with cte as (
SELECT count(cai.id)
    FROM common_activityinstance cai
    JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
    JOIN common_activitysetting cas ON cas.id = cais.id
    WHERE cai.end_time::date = '2015-09-11'
        AND (key = 'disable_student_nav'
                AND value = 'True'
                OR key = 'pacing'
                AND value = 'student')
    group by cai.id having count(cai.id) >1
    )

    select count(*) from  cte

Difference between CTE and SubQuery?

Sign up to request clarification or add additional context in comments.

Comments

0

Because of the structure of the query, I suspect that duplicates might only arise from the or part of the query. If you are limited to at most two duplicates, you can do the calculation without a subquery:

SELECT count(cai.id) - count(distinct cai.id)
FROM common_activityinstance cai JOIN
     common_activityinstance_settings cais
     ON cai.id = cais.activityinstance_id JOIN
     common_activitysetting cas
     ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-11' AND
      (key, value) IN (('disable_student_nav', 'True'), ('pacing', 'student'));

Note: This only works in the special case that each id appears only once or twice.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.