Does mysql optimize result table to avoid duplicate values

Question

In case of a JOIN, especially in a one to many relationship, the result set will very often contain a lot of duplicate information in the result.

For example,

TABLE_A_ID	TABLE_A_FIELD_ONE	TABLE_A_FIELD_TWO	TABLE_B_ID	TABLE_B_FOREIGN_KEY	TABLE_B_FIELD_ONE
12	MARK	POLIO	41	12	SHOE
12	MARK	POLIO	42	12	SHIRT
12	MARK	POLIO	43	12	BOOT
12	MARK	POLIO	44	12	HAT
12	MARK	POLIO	45	12	GLOVE

As you can see, 12, MARK, POLIO are sent six times in a row. In my real situation, I am noticing that I am sending the same data thousands of times in a row, sending over 20x the amount of data over the wire from db to app server, and 20x as much memory.

I was about to start optimizing the **** out of it, but then I realized that it is very likely that mysql drivers already do optimize this, for example by sending only table b columns, and table a columns for when TABLE_B_FOREIGN_KEY changes.

It is really optimized this way or another way, or should I stop using joins and instead do round trips to the DB?

As you can see, 12, MARK, POLIO are sent six times Well 5 actually :) — RiggsFolly
– RiggsFolly, Commented Oct 21, 2021 at 14:40
likely that mysql drivers already do optimize this NO, the result is correct based on the query you wrote — RiggsFolly
– RiggsFolly, Commented Oct 21, 2021 at 14:41
or should I stop using joins and instead do round trips Absolutley NOT, the round trips are worse than shifting a bit of extra data up the wire — RiggsFolly
– RiggsFolly, Commented Oct 21, 2021 at 14:42
The result is correct, but we are talking about optimization, not correctness. — f.khantsis
– f.khantsis, Commented Oct 21, 2021 at 14:44
Ok, so exactly how were you considering you were going to optimise this — RiggsFolly
– RiggsFolly, Commented Oct 21, 2021 at 14:45

Rick James · Accepted Answer · 2021-10-21 22:56:15Z

0

When the duplicates are in a single table, consider adding UNIQUE (col) to that table. This forces uniqueness (at Insertion time) based on col. If Uniqueness is determined by a combination of columns, then UNIQUE(col1, col2, ...).

More likely you need DISTINCT to dedup whole rows.

SELECT DISTINCT col1, col2, ...
    FROM ...
    JOIN ... ON ...

But there are cases where the dedupping should be done in other ways. Let's see your SELECT. (I'm thinking of GROUP BY or a 'derived table'.)

answered Oct 21, 2021 at 22:56

Rick James

144k15 gold badges144 silver badges254 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

f.khantsis Over a year ago

I need the resulting column to have dupes, I am just hoping that it sends less data over the wire and takes less memory.

Collectives™ on Stack Overflow

Does mysql optimize result table to avoid duplicate values

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related