0

In case of a JOIN, especially in a one to many relationship, the result set will very often contain a lot of duplicate information in the result.

For example,

TABLE_A_ID TABLE_A_FIELD_ONE TABLE_A_FIELD_TWO TABLE_B_ID TABLE_B_FOREIGN_KEY TABLE_B_FIELD_ONE
12 MARK POLIO 41 12 SHOE
12 MARK POLIO 42 12 SHIRT
12 MARK POLIO 43 12 BOOT
12 MARK POLIO 44 12 HAT
12 MARK POLIO 45 12 GLOVE

As you can see, 12, MARK, POLIO are sent six times in a row. In my real situation, I am noticing that I am sending the same data thousands of times in a row, sending over 20x the amount of data over the wire from db to app server, and 20x as much memory.

I was about to start optimizing the **** out of it, but then I realized that it is very likely that mysql drivers already do optimize this, for example by sending only table b columns, and table a columns for when TABLE_B_FOREIGN_KEY changes.

It is really optimized this way or another way, or should I stop using joins and instead do round trips to the DB?

11
  • As you can see, 12, MARK, POLIO are sent six times Well 5 actually :) Commented Oct 21, 2021 at 14:40
  • likely that mysql drivers already do optimize this NO, the result is correct based on the query you wrote Commented Oct 21, 2021 at 14:41
  • 1
    or should I stop using joins and instead do round trips Absolutley NOT, the round trips are worse than shifting a bit of extra data up the wire Commented Oct 21, 2021 at 14:42
  • The result is correct, but we are talking about optimization, not correctness. Commented Oct 21, 2021 at 14:44
  • Ok, so exactly how were you considering you were going to optimise this Commented Oct 21, 2021 at 14:45

1 Answer 1

0

When the duplicates are in a single table, consider adding UNIQUE (col) to that table. This forces uniqueness (at Insertion time) based on col. If Uniqueness is determined by a combination of columns, then UNIQUE(col1, col2, ...).

More likely you need DISTINCT to dedup whole rows.

SELECT DISTINCT col1, col2, ...
    FROM ...
    JOIN ... ON ...

But there are cases where the dedupping should be done in other ways. Let's see your SELECT. (I'm thinking of GROUP BY or a 'derived table'.)

Sign up to request clarification or add additional context in comments.

1 Comment

I need the resulting column to have dupes, I am just hoping that it sends less data over the wire and takes less memory.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.