0

I'm experiencing a weird beahviour on a REST api endpoint. Basically I have two tables, a user table with an id and other relavant fields, and an action table with a uid (which maps the user ids) and several other fields.

I want to extract the user that has performed a particular action and I'm doing something like this:

SELECT * FROM users where id IN (select uid from action WHERE [CONDITIONS] order by [CRITERIA]);

Given the size of my DB this query runs in about two seconds, which is completely unacceptable for my use case.

The weird behaviour comes if I split the query in two subqueries, executing first:

select uid from action WHERE [CONDITIONS] order by [CRITERIA];

Than after having manually concatenated the string to be matched by the IN operator executing:

SELECT * FROM users where id IN [MANUAL CONCAT];

The two queries runs both under 5ms, on the same dataset.

It was my (possibly wrong) understanding that the subquery would have been executed first and then the main query afterwards. Is it wrong? Does MySQL perform the IN subquery EACH time?

UPDATE

If I simply use a join (see code below) is a lot faster (about 10ms), but I still don't get how IN works here.

SELECT distinct * FROM users join action on users.id = action.uid where [CONDITIONS];

I suspect that the fact that the rows actually matched in action are about 5-10 out of 5M counts something.

1
  • The performance that you are seeing suggests that the conditions are playing a very important role in the optimization. Commented Jul 8, 2015 at 23:48

2 Answers 2

1

IN ( SELECT ... ) is very poorly optimized -- the SELECT is repeatedly evaluated.

In some new version the SELECT will be materialized and an INDEX will be automatically generated. Still, a JOIN is very likely to continue to be faster.

Look at the EXPLAIN SELECT ...; it may give some clues of what is (or is not) going on. If you would like further discussion, please provide EXPLAIN, the full SELECT and SHOW CREATE TABLE for the table(s).

Sign up to request clarification or add additional context in comments.

Comments

1

First, order by for in is not useful, so you can rewrite it without an order by.

Second, exists is usually faster than in:

SELECT u.*
FROM users u
WHERE EXISTS (SELECT 1 FROM action a WHERE a.uid = u.id and [CONDITIONS]);

For optimal performance you want an index on action(uid, . . .). The . . . is for other columns needed for the [CONDITIONS].

1 Comment

EXISTS is faster, but it doesn't make the difference. WIthout any cache, the average execution time of the first query is 2.10 s, with EXISTS it get to 1.95 s. I'll add the indexes, but I still don't get how making two different queries is that faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.