2

I cannot find the answer to my problem here on stackoverflow. I have a query that spans 3 tables:

newsitem
+------+----------+----------+----------+--------+----------+
| Guid | Supplier | LastEdit | ShowDate |  Title | Contents |
+------+----------+----------+----------+--------+----------+
newsrating
+----+----------+--------+--------+
| Id | NewsGuid | UserId | Rating |
+----+----------+--------+--------+
usernews
+----+----------+--------+----------+
| Id | NewsGuid | UserId | ReadDate |
+----+----------+--------+----------+

Newsitem obviously contains newsitems, newsrating contains ratings that users give to newsitems, and usernews contains the date when a user has read a newsitem.

In my query I want to get every newsitem, including the number of ratings for that newsitem and the average rating, and how many times that newsitem has been read by the current user.

What I have so far is:

select newsitem.guid, supplier, count(newsrating.id) as numberofratings,
    avg(newsrating.rating) as rating,
    count(case usernews.UserId when 3 then 1 else null end) as numberofreads from newsitem
    left join newsrating on newsitem.guid = newsrating.newsguid
    left join usernews on newsitem.guid = usernews.newsguid
    group by newsitem.guid

I have created an sql fiddle here: http://sqlfiddle.com/#!9/c8add/8

Both count() calls don't return the numbers I want. numberofratings should return the total number of ratings for that newsitem (by all users). numberofreads should return the number of reads for the current user for that newsitem.

So, newsitem with guid d104c330-c319-40e8-8be3-a7c4f549d35c should have 2 ratings and 3 reads for the current user with userid = 3.

I have tried conditional counts and sums, but no success yet. How can this be accomplished?

3
  • well formated question. But next time may I suggest you use an integer ID instead a GUID, the idea is make the question simpler. Also if possible you should include your desire output with the provided data in a table format Commented Jan 7, 2016 at 14:27
  • 1
    the problem seems to be that the both left joins both increase the number of rows returned for the count (it basically combines options of join a and join b) , which leads to the result that numberofreads and number of ratings are both the sum of the actual values. so i guess you either have to use a stored database function or a subselect instead of one of the left joins Commented Jan 7, 2016 at 14:33
  • @JuanCarlosOropeza, I realized the guids would make the question a bit harder to read. A little laziness from my side when composing the question :) About the desired output: I thought the question was getting too long already, that's why I decided to keep the actual data in the sql fiddle. It might have been clearer to put it here though. Commented Jan 7, 2016 at 14:41

2 Answers 2

2

The main problem that I see is that you're joining in both tables together, which means that you're going to effectively be multiplying out by both numbers, which is why your counts aren't going to be correct. For example, if the Newsitem has been read 3 times by the user and rated by 8 users then you're going to end up getting 24 rows, so it will look like it has been rated 24 times. You can add a DISTINCT to your COUNT of the ratings IDs and that should correct that issue. Average should be unaffected because the average of 1 and 2 is the same as the average of 1, 1, 2, & 2 (for example).

You can then handle the reads by adding the userid to the JOIN condition (since it's an OUTER JOIN it shouldn't cause any loss of results) instead of in a CASE statement for your COUNT, then you can do a COUNT on distinct id values from Usernews. The resulting query would be:

SELECT
    I.guid,
    I.supplier,
    COUNT(DISTINCT R.id) AS number_of_ratings,
    AVG(R.rating) AS avg_rating,
    COUNT(DISTINCT UN.id) AS number_of_reads
FROM
    NewsItem I
LEFT OUTER JOIN NewsRating R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON
    UN.newsguid = I.guid AND
    UN.userid = @userid
GROUP BY
    I.guid,
    I.supplier

While that should work, you might get better results from a subquery, as the above needs to explode out the results and then aggregate them, perhaps unnecessarily. Also, some people might find the below to be a little clearer.

SELECT
    I.guid,
    I.supplier,
    R.number_of_ratings,
    R.avg_rating,
    COUNT(*) AS number_of_reads
FROM
    NewsItem I
LEFT OUTER JOIN
(
    SELECT
        newsguid,
        COUNT(*) AS number_of_ratings,
        AVG(rating) AS avg_rating
    FROM
        NewsRating
    GROUP BY
        newsguid
) R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON UN.newsguid = I.guid AND UN.userid = @userid
GROUP BY
    I.guid,
    I.supplier,
    R.number_of_ratings,
    R.avg_rating
Sign up to request clarification or add additional context in comments.

5 Comments

Im sure our query are very similar, but your second didnt include a @user_id or something to filter current user.
Thanks. I put that in the first query, but forgot it in the second. I'll add it now. Probably also clearer to change the hard-coded 3 to a variable as you suggested.
It never ceases to amaze me how quickly people can come up with a correct and elaborate answer. Thank you very much Tom.
Thanks for the flattery, but really it's just a matter of patterns. I've been developing in SQL Server for 20 years, so I've seen this pattern before. It was just a matter of typing out what I've seen before. :)
This is 'easy' sql for you, but you took the time to explain the problem and write a working solution. That's what I wanted to thank you for :)
2

I'm with Tom you should use a subquery to calculate the user count.

SQL Fiddle Demo

SELECT NI.guid,
       NI.supplier,
       COUNT(NR.ID) as numberofratings, 
       AVG(NR.rating) as rating,
       user_read as numberofreads 
FROM newsitem NI
LEFT JOIN newsrating NR
       ON NI.guid = NR.newsguid
LEFT JOIN (SELECT NewsGuid, COUNT(*) user_read
           FROM usernews
           WHERE UserId = 3   -- use a variable @user_id here
           GROUP BY NewsGuid) UR
       ON NI.guid = UR.NewsGuid
GROUP BY  NI.guid,
          NI.supplier, 
          numberofreads;

1 Comment

FYI. To solve this issues you need split the problem in smaller parts. I start removing things to see what data was on the result, first the GROUP BY then the second LEFT JOIN then I realize where the problem was and find a way to solve it, in this case the subquery.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.