2

I have main query:

 SELECT mt.media_id, mt.title,
 SUM(DISTINCT sht.share) as share_count
 FROM $grouped_media_table mt   
 LEFT JOIN $share_table sht ON sht.media_id = mt.media_id AND sht.title = mt.title
 WHERE mt.media_id = %d AND mt.title = %s 
 GROUP BY mt.media_id, mt.title

grouped_media_table :

CREATE TABLE $grouped_media_table (
            `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
            `media_id` int(11) unsigned DEFAULT NULL,
            `title` varchar(300) DEFAULT NULL,
            PRIMARY KEY (`id`),
            INDEX `title` (`title`)
);

share_table

CREATE TABLE $share_table ( 
            `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
            `share` int(11) unsigned DEFAULT '0',
            `media_id` int(11) unsigned DEFAULT NULL,
            `user_id` int(11) unsigned DEFAULT NULL,
            `title` varchar(300) DEFAULT NULL,
            `c_date` datetime,
            PRIMARY KEY (`id`),
            INDEX `media_id` (`media_id`)
);

This returns one row for media_id=5 and title="foo"

And another query:

SELECT ct.id, ct.comment, ct.comment_parent_id, ct.reported, ct.user_id, ct.user_display_name, ct.avatar, ct.c_date, SUM(vt.vote) AS vote, MAX(vt.user_id = %d) AS user_voted,
MAX(CASE WHEN vt.user_id = %d THEN vote END) AS user_vote
                    FROM $comments_table as ct 
                    LEFT JOIN $votes_table vt on ct.id = vt.comment_id 
                    WHERE ct.media_id=%d AND ct.title=%s
                    GROUP BY ct.id
                    ORDER BY ct.c_date DESC

This returns all rows for media_id=5 and title="foo"

comments_table :

CREATE TABLE $comments_table ( 
            `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
            `media_id` int(11) unsigned DEFAULT NULL,
            `user_id` int(11) unsigned DEFAULT NULL,
            `user_display_name` varchar(100) DEFAULT NULL,
            `avatar` varchar(300) DEFAULT NULL,
            `origtype` varchar(20) DEFAULT NULL,
            `title` varchar(300) DEFAULT NULL,
            `url` varchar(300) DEFAULT NULL,
            `c_date` datetime,
            `comment` longtext DEFAULT NULL,
            `comment_parent_id` int(11) unsigned DEFAULT NULL,
            `reported` tinyint(1) DEFAULT 0,
            PRIMARY KEY (`id`),
            INDEX `media_id` (`media_id`)
);

votes_table:

CREATE TABLE $votes_table ( 
            `comment_id` int(11) unsigned DEFAULT NULL,
            `user_id` int(11) unsigned DEFAULT NULL,
            `vote` tinyint(1) DEFAULT 0,
            INDEX `comment_id` (`comment_id`)
);

My queries work well on their own. Is there a way to combine this into single query?

I tried the following, but it only returns one row (comment) from comments_table (while it should return all comments for media_id=5 and title="foo"). share_count is correctly returned so this join works well.

 SELECT mt.media_id, mt.title, SUM(DISTINCT sht.share) as share_count, ct.comment
 FROM $grouped_media_table mt   
LEFT JOIN $share_table sht ON sht.media_id = mt.media_id AND sht.title = mt.title
LEFT JOIN (SELECT ct.id, ct.media_id, ct.title, comment, ct.comment_parent_id, ct.reported, ct.user_id, ct.user_display_name, ct.avatar, ct.c_date, SUM(vt.vote) AS vote, 
                            MAX(vt.user_id = %d) AS user_voted,
                            MAX(CASE WHEN vt.user_id = %d THEN vote END) AS user_vote
                            FROM $comments_table as ct
                            LEFT JOIN $votes_table vt on ct.id = vt.comment_id 
                            WHERE ct.media_id = %d AND ct.title=%s 
                            GROUP BY ct.id
                            ORDER BY ct.c_date DESC
                        ) as ct ON ct.media_id = mt.media_id AND ct.title = mt.title
 WHERE mt.media_id = %d AND mt.title = %s 
 GROUP BY mt.media_id, mt.title

Note - I need all comment data returned, but now for clarity I just use SELECT mt.media_id, mt.title, ct.comment

8
  • This returns one row for media_id=5 and title="foo" It always returns 1 output row, for any media_id and title? Add query 2 as one more row source into query 1 : SELECT .. FROM ... CROSS JOIN (query 2) AS query2 WHERE ... Commented Jun 25 at 12:02
  • Yes, main query returns one row, thats correct and it should return one row. But for comments I want all rows and combine that output with row above. Commented Jun 25 at 12:27
  • In your query you're selecting ct.comment, but ct.comment is not aggregated and is not part of the GROUP BY clause with mt.media_id and mt.title Commented Jun 25 at 12:29
  • MySQL does not have an array type, but you can do a trick with JSON aggregation of the joined rows. See a simple example I wrote here: stackoverflow.com/a/55816593/20860 Commented Jun 25 at 13:53
  • This is out of my knowledge (and I need to maintain code). I was hoping @kapandron example can be rewritten to return 1 row. Of course I can do it on backend, but seems strange to return multiple rows wehn I expect one. Commented Jun 25 at 14:58

3 Answers 3

2

I think, share_count and vote,user_voted,user_vote should be calculated with subquery.
Then, no need for group by in $comments_table, and rows can be selected with all needed columns.

Try this

set @d:=5;
set @s:='foo';
  
SELECT mt.media_id, mt.title, sht.share_count, ct.comment -- ct.*
 FROM $grouped_media_table mt   
LEFT JOIN (
  SELECT media_id,title
     ,SUM(DISTINCT sht.share) as share_count 
  FROM $share_table sht
  WHERE sht.media_id = @d AND sht.title = @s
  GROUP BY media_id,title
  ) sht ON sht.media_id = mt.media_id AND sht.title = mt.title
LEFT JOIN (
  SELECT ct.id, ct.media_id, ct.title, comment 
   ,ct.comment_parent_id, ct.reported, ct.user_id, ct.user_display_name, ct.avatar
   ,ct.c_date
   ,vote,user_voted,user_vote
  FROM $comments_table as ct
  LEFT JOIN lateral (
      SELECT vt.comment_id,SUM(vt.vote) AS vote, 
           MAX(vt.user_id = @d) AS user_voted,
           MAX(CASE WHEN vt.user_id = @d THEN vote END) AS user_vote
      FROM $votes_table vt
      WHERE ct.id = vt.comment_id
      group by vt.comment_id
    ) vt on ct.id = vt.comment_id 
  WHERE ct.media_id = @d AND ct.title=@s 
) as ct ON ct.media_id = mt.media_id AND ct.title = mt.title
 WHERE mt.media_id = @d AND mt.title = @s 
 ORDER BY ct.c_date DESC

for this example used %d -> @d, %s -> @s.

fiddle
or else

Sign up to request clarification or add additional context in comments.

11 Comments

There is an error in this query even shown on fiddle.
That is your query and error means - not defined parameters %d and %s. Should not pay attention to this request in fiddle. In a real request replace @d->%d, @s->%s, you will send these parameters as usual (%s,%d).
I still get error on my side but I dont see it: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '( SELECT vt.comment_id, SUM(vt.vote) AS vote pastecode.io/s/i3o5rkqg
What is MySql version?
MariaDB 10.4.13
Uh, @Toniq, "check the manual that corresponds to your MariaDB": so you're on MariaDB, not MySQL?
Yes, just checked, its MariaDB 10.4.13
mariadb is a fork of mysql but no longer fully compatible: it has features mysql does not, and vice versa, and sometimes differing syntax for the same feature. lateral joins are an example of a feature not yet implemented in mariadb. I've retagged your question so you can get appropriate answers.
Also note that 10.4 is past end of life and no longer receiving security updates; consider it insecure and plan to upgrade as soon as you can to a newer long term support version (10.11, 11.4, 11.8)
I think, LATERAL JOIN is not available in MariaDB 10.4.13. My request example will need to be redone.
Or any version yet afaict. jira.mariadb.org/browse/MDEV-33018
1

Theory

As a general (but admittedly simplistic, sometimes exception will prevail) rule, do never use more than one 1..n relation in a select.

1. As you correctly noticed, SQL will duplicate your values. This is by design, SQL ought to be deterministic so to get understandable the queries can be decomposed in steps, each step executing in order. The joins, then the wheres, then the window functions, then the group bys, and so on.

The interesting property of this law is that you can decompose the RDBMS results' building by hand, omitting the later steps if you're trying to diagnose the previous ones.
So if we skip the grouping, staying at the join step, you can imagine what happens when you join a media with 2 shares (let's call them A and B) and 3 comments (dated 1st day, 2nd day, 3rd day): this makes two 1..n relations, and SQL will simply join each possible row of share to each possible row of comment, as long as they're related to the same media:

media_id c_date share
1 01-01 A
1 01-02 A
1 01-03 A
1 01-01 B
1 01-02 B
1 01-03 B
  • Some aggregate functions have no problems with duplicates (max(x, x) is always x),
  • some can use distinct to avoid duplicates (count(share) returns 6 but count(distinct share) will find 2),
  • but some can nothing (if each comment comes with one vote, sum(vote) returns 6 due to the duplicates, but sum(distinct vote) on the contrary will return 1: you'll never get the intended 3 after SQL having merged too much, considering that all votes at 1 should be counted as one unique vote).

2. From a performance point of view, even if you can mitigate some side effects (distinct), you're still asking your RDBMS to have each c_date stored twice and each share three times in its internal, intermediate resultset between the where and the group by.

Resolving it

So you have to get the maximum of 1..1 relations in each query, and keep only 1..n relation for the final query.

Good news: that's exactly what you intend, with (at the end) each 1 comment with its n votes.

There just has to get the intermediate tables on this "one 1..n relation per step" model.

Note that if we consider counting aggregates of things, we fall back in the (reinsuring) 1..1 case:
although we have n shares per media, we have only 1 count of shares per media.

The tool that eases writing (and mostly reading!) those step-by-step queries (first generating a media_shares pseudo-table which is 1..1 against media_id, and so on), apart from subqueries, is Common Table Expression, or "with expressions":

with
  -- 1 media_id .. n shares → 1 media_id .. 1 count of shares
  media_shares as (select media_id, … group by media_id),
  [others 1..n → 1..1]
select … from media_shares ms join … on ms.media_id = …);

Let's practice

WITH
  -- 1 media_id .. n shares → 1 media_id .. 1 count of shares
  media_shares AS (SELECT media_id, title, SUM(share) share_count FROM $share_table GROUP BY media_id, title)
  -- Here we could have another CTE doing:
  -- 1 comment.id .. n votes → 1 comment.id .. 1 number of votes
  -- However, we now have sufficiently reduced our relations, because our final query will be centered on comments, not on media, and from each comment we have:
  -- - 1 comment .. n votes
  -- - 1 comment .. 1 media with its count of shares
  -- which make one 1..n + one 1..1, which means WE HAVE REACHED OUR GOAL OF ONLY 1..n RELATION:
  -- Green light for one query!
SELECT
  -- 1..1 columns from mt
  mt.media_id, mt.title, share_count,
  -- 1..1 columns from ct (the GROUP BY ct.id is sufficient as id is its primary key)
  ct.id, ct.media_id, ct.title, comment, ct.comment_parent_id, ct.reported, ct.user_id, ct.user_display_name, ct.avatar, ct.c_date,
  -- 1..n columns from vt
  SUM(vt.vote) AS vote,
  MAX(vt.user_id = 1) AS user_voted,
  MAX(CASE WHEN vt.user_id = 1 THEN vote END) AS user_vote
FROM $comments_table as ct
JOIN media_shares mt ON ct.media_id = mt.media_id AND ct.title = mt.title
LEFT JOIN $votes_table vt on ct.id = vt.comment_id
WHERE ct.media_id = 5 AND ct.title='foo'
GROUP BY ct.id
ORDER BY ct.c_date DESC;
media_id title share_count id media_id title comment comment_parent_id reported user_id user_display_name avatar c_date vote user_voted user_vote
5 foo 8 1 5 foo bad! null 0 null null null null 1 1 1
5 foo 8 2 5 foo not so bad null 0 null null null null 0 1 0
5 foo 8 3 5 foo well finally it's great null 0 null null null null 1 1 1

(you will find the full example running in a fiddle)

By the way

I find it strange that you have to replicate the media's title over the comments, the shares and so on. Perhaps you should renormalize your database, to have it stored in one place, and all other tables having only media_id as a foreign key.

Comments

0
SELECT
    mt.media_id,
    mt.title,
    (SELECT SUM(sht.share) FROM $share_table sht WHERE sht.media_id = mt.media_id AND sht.title = mt.title) AS share_count,
    COALESCE(
      CONCAT('[',
        GROUP_CONCAT(
            JSON_OBJECT(
                'id', cwv.id,
                'comment', cwv.comment,
                'comment_parent_id', cwv.comment_parent_id,
                'reported', cwv.reported,
                'user_id', cwv.user_id,
                'user_display_name', cwv.user_display_name,
                'avatar', cwv.avatar,
                'c_date', cwv.c_date,
                'vote', cwv.vote,
                'user_voted', cwv.user_voted,
                'user_vote', cwv.user_vote
            )
            ORDER BY cwv.c_date DESC SEPARATOR ','
        )
      ,']')
    , '[]') AS comments
FROM
    $grouped_media_table mt
LEFT JOIN (
    SELECT
        ct.media_id,
        ct.title,
        ct.id,
        ct.comment,
        ct.comment_parent_id,
        ct.reported,
        ct.user_id,
        ct.user_display_name,
        ct.avatar,
        ct.c_date,
        IFNULL(SUM(vt.vote), 0) AS vote,
        MAX(CASE WHEN vt.user_id = %id THEN 1 ELSE 0 END) AS user_voted,
        MAX(CASE WHEN vt.user_id = %id THEN vt.vote END) AS user_vote
    FROM
        $comments_table ct
    LEFT JOIN
        $votes_table vt ON ct.id = vt.comment_id
    GROUP BY
        ct.id
) AS cwv ON mt.media_id = cwv.media_id AND mt.title = cwv.title
WHERE
    mt.media_id = %d AND mt.title = %s
GROUP BY
    mt.media_id, mt.title;

4 Comments

This returns 4 rows (because there are 4 comments which is correct). Can this be returned as 1 row (because its one media_id=5 with title="foo" ) , but with comments in array?
Check my updated solution
its just single media with multiple comments, that why I try to return one result ,but comments should be in array.
Do you need comments in json form? It'd be much more useful and efficient if there's a possibility to use json JSON_ARRAYAGG, but I see it's not available in your version MariaDB 10.4.13, it was added only in 10.5.0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.