2

I'm performing a query that is looking for values in one table that are not in another. For example:

SELECT id FROM table1 WHERE id NOT IN (SELECT id FROM table2);

Both tables have about 1 million rows but only a few hundred that do not match values which are returned in the result set. The query takes about 35 seconds. Doing a show profile on the query shows that mysql is spending most of the time in the state of "preparing". Any ideas on how I can optimize this state or what is actually happening during "preparing"?

The id value in both tables is indexed and of the same type and size.

The entire profile of the query is:

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| (initialization)               | 0        | 
| checking query cache for query | 0        | 
| Opening tables                 | 0.13     | 
| System lock                    | 0        | 
| Table lock                     | 0        | 
| init                           | 0.01     | 
| optimizing                     | 0        | 
| statistics                     | 0        | 
| preparing                      | 0        | 
| executing                      | 0        | 
| Sending data                   | 0        | 
| optimizing                     | 0        | 
| statistics                     | 0        | 
| preparing                      | 34.83    | 
| end                            | 0        | 
| query end                      | 0        | 
| freeing items                  | 0        | 
| closing tables                 | 0        | 
| logging slow query             | 0        | 
+--------------------------------+----------+

Any tips are appreciated.

Thanks.

3 Answers 3

3

I would left join table2 on id where id is null. This would give you a much faster return of the data.

select
     a.id
from
     table1 a 
     left join table2 b on a.id = b.id and b.id is null
Sign up to request clarification or add additional context in comments.

3 Comments

Should this be and WHERE b.id is null?
it is preferable to specify this in the where clause; but the optimiser should produce identical plans.
@Mitch Wheat: it is required to specify this in the WHERE clause, otherwise the query will return wrong results (just all records from a)
2

There's nothing to optimize - NOT IN produces an equivalent query plan to LEFT JOIN/IS NULL in MySQL. Quote:

However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.

That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.

For more information, see NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

1 Comment

table2.id should be indexed of course to produce the equivalent query. But from its name (which sounds like a PK) and the fact that the query takes 35 seconds for 1,000,000 rows we can conclude that it is indexed.
1
SELECT id FROM table1
LEFT JOIN table2 ON table1.id = table2.id
WHERE table2.id IS NULL;

1 Comment

The query still takes about 30 seconds but for this "show profile;" most of time is in "Sending Data". Is that just the time it takes for mysql to run the query or is there anything I can do to speed that up?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.