Optimized SQL Query

Question

Table Schema

For the two tables, the CREATE queries are given below:

Table1: (file_path_key, dir_path_key)

create table Table1(file_path_key varchar(500), dir_path_key varchar(500), primary key(file_path_key)) engine = innodb;

Example, file_path_key = /home/playstation/a.txt
dir_path_key = /home/playstation/

Table2: (file_path_key, hash_key)

create table Table2(file_path_key varchar(500) not null, hash_key bigint(20) not null, foreign key (file_path_key) references Table1(file_path_key) on update cascade on delete cascade) engine = innodb;

Objective:

Given a hash value *H* and a directory string *D*, I need to find all those 
hashes which equal to *H* from Table2, such that, the corresponding file entry 
doesn't have *D* as it's directory.

In this particular case, Table1 has around 40,000 entries and Table2 has 5,000,000 entries, which makes my current query really slow.

select distinct s1.file_path_key from Table1 as s1 join (select * from Table2 where hash_key = H) as s2 on s1.file_path_key = s2.file_path_key and s1.dir_path_key !=D;

The (potential) size of your key certainly isn't helping. It doesn't look like you need the potential key range - would you consider switching to an auto-gen primary key that you join on? This should reduce the size of your table considerably - for one thing, it would mean that file_path_key could be turned into just file (which would potentially reduce mismatches). Too bad you're not using an RDBMS that supports recursive CTEs - they work perfectly for folder structures. — Clockwork-Muse
– Clockwork-Muse, Commented Mar 6, 2012 at 17:39

Ike Walker · Accepted Answer · 2012-03-06 17:39:48Z

1

The sub-select is really slowing your query down unnecessarily.

You should remove that and replace it with a simple join, moving pushing all of the non-join related criteria down into the WHERE clause.

Also you should add indexes on the Table1.dir_path_key and Table2.hash_key columns:

ALTER TABLE Table1
  ADD INDEX dir_path_key dir_path_key(255);

ALTER TABLE Table2
  ADD INDEX hash_key (hash_key);

Try something like this for the query:

select distinct s1.file_path_key 
from Table1 as s1 
join Table2 as s2 on s1.file_path_key = s2.file_path_key
where s1.dir_path_key !=D
and s2.hash_key =H;

edited Mar 6, 2012 at 17:39

answered Mar 6, 2012 at 17:25

Ike Walker

65.8k14 gold badges115 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Gooner Over a year ago

Sure, I'll try this. How do you go about adding an index to a column?

Ike Walker Over a year ago

I added sample DDL for creating the indexes. Beware this will lock the tables for a few minutes so you should not do this on a live production database.

Gooner Over a year ago

Well, the tables are not updated once they are filled in my use case. So that shouldn't be a problem?

Gooner Over a year ago

Sorry I'm a little late, but adding indexes worked perfectly! SELECT queries are so much faster now! Thanks Ike!

aaroncatlin · Accepted Answer · 2012-03-06 17:22:43Z

1

I'd suggest selecting entries from Table2 into a temporary table first:

SELECT * FROM Table2 INTO #Temp WHERE hash_key = H

Then join the temporary table in your SELECT statement:

select distinct s1.file_path_key from Table1 as s1 join #Temp as s2 on s1.file_path_key = s2.file_path_key and s1.dir_path_key !=D;

answered Mar 6, 2012 at 17:22

aaroncatlin

3,3011 gold badge18 silver badges28 bronze badges

2 Comments

Gooner Over a year ago

Does that make a difference to the query execution time?

aaroncatlin Over a year ago

I usually notice a fair difference when I've put this into practise in the past.

Collectives™ on Stack Overflow

Optimized SQL Query

2 Answers 2

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related