Optimizing MySQL LIKE Query with Pattern Matching for Large Dataset (20M+ records)
I'm struggling with performance issues in MySQL while searching through a large table containing over 20 million records. My current query using LIKE operator is timing out:
SELECT * FROM my_table WHERE columnName LIKE '%key%';
Current Setup:
- Database: MySQL
- Table Size: ~20 million records
- Search Pattern Requirements:
'%keyword%'(contains)'keyword%'(starts with)'%keyword'(ends with)
- Search keywords can include:
- Numbers
- Alphabets
- Special characters (- and _)
What I've tried:
- Implemented FULLTEXT SEARCH:
ALTER TABLE my_table ADD FULLTEXT(columnName);
SELECT * FROM my_table WHERE MATCH(columnName) AGAINST('*keyword*' IN BOOLEAN MODE);
- Configured ngram parser:
ngram_token_size=3
Issues Faced:
- FULLTEXT SEARCH doesn't provide accurate results for all keyword patterns
- Regular
LIKEqueries are too slow - Need to maintain accuracy while improving performance with FULL TEXT SEARCH as well
- ngram=3 is slowing the query for few keywords
Questions:
- How can I tune FULLTEXT SEARCH to achieve accurate results similar to
LIKE '%keyword%'? - Are there any alternative approaches or indexing strategies for this use case?
- What would be the optimal configuration for ngram parser to handle all these patterns?
Any help on optimizing this search while maintaining accuracy would be greatly appreciated.
LIKEcomparison needs to be applied to? Something like a date, if you only need to look in records of the last 5 years. Or anything like that? Maybe use multiple columns? Basically anything that would prevent a full search on all records, which will indeed be slow. Anything in the search pattern that could help? Are you looking for whole words, for instance?