0

Table Schema

For the two tables, the CREATE queries are given below:

Table1: (file_path_key, dir_path_key)

create table Table1(file_path_key varchar(500), dir_path_key varchar(500), primary key(file_path_key)) engine = innodb;

Example, file_path_key = /home/playstation/a.txt
dir_path_key = /home/playstation/

Table2: (file_path_key, hash_key)

create table Table2(file_path_key varchar(500) not null, hash_key bigint(20) not null, foreign key (file_path_key) references Table1(file_path_key) on update cascade on delete cascade) engine = innodb;

Objective:

Given a hash value *H* and a directory string *D*, I need to find all those 
hashes which equal to *H* from Table2, such that, the corresponding file entry 
doesn't have *D* as it's directory.

In this particular case, Table1 has around 40,000 entries and Table2 has 5,000,000 entries, which makes my current query really slow.

select distinct s1.file_path_key from Table1 as s1 join (select * from Table2 where hash_key = H) as s2 on s1.file_path_key = s2.file_path_key and s1.dir_path_key !=D;

1
  • The (potential) size of your key certainly isn't helping. It doesn't look like you need the potential key range - would you consider switching to an auto-gen primary key that you join on? This should reduce the size of your table considerably - for one thing, it would mean that file_path_key could be turned into just file (which would potentially reduce mismatches). Too bad you're not using an RDBMS that supports recursive CTEs - they work perfectly for folder structures. Commented Mar 6, 2012 at 17:39

2 Answers 2

1

The sub-select is really slowing your query down unnecessarily.

You should remove that and replace it with a simple join, moving pushing all of the non-join related criteria down into the WHERE clause.

Also you should add indexes on the Table1.dir_path_key and Table2.hash_key columns:

ALTER TABLE Table1
  ADD INDEX dir_path_key dir_path_key(255);

ALTER TABLE Table2
  ADD INDEX hash_key (hash_key);

Try something like this for the query:

select distinct s1.file_path_key 
from Table1 as s1 
join Table2 as s2 on s1.file_path_key = s2.file_path_key
where s1.dir_path_key !=D
and s2.hash_key =H;
Sign up to request clarification or add additional context in comments.

4 Comments

Sure, I'll try this. How do you go about adding an index to a column?
I added sample DDL for creating the indexes. Beware this will lock the tables for a few minutes so you should not do this on a live production database.
Well, the tables are not updated once they are filled in my use case. So that shouldn't be a problem?
Sorry I'm a little late, but adding indexes worked perfectly! SELECT queries are so much faster now! Thanks Ike!
1

I'd suggest selecting entries from Table2 into a temporary table first:

SELECT * FROM Table2 INTO #Temp WHERE hash_key = H

Then join the temporary table in your SELECT statement:

select distinct s1.file_path_key from Table1 as s1 join #Temp as s2 on s1.file_path_key = s2.file_path_key and s1.dir_path_key !=D;

2 Comments

Does that make a difference to the query execution time?
I usually notice a fair difference when I've put this into practise in the past.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.