0

I have a MariaDB database with a huge list of part numbers. But I need to be able to make search queries for the product where users might not use the right seperating characters. For example the part number on my system could be something like 2234A-22-43 But people might search for it as '2234A2243' or '2234A 22 43' or '2234A.22.43' How do I make sure I catch each search variant and still get the same product? Thank you in advance

I tried creating a seperate column for alternate part numbers and adding all the variants, but that seems like a not so great solution. I wanted to find a way to solve this using the query itself.

1
  • You could use LIKE, or you could normalize what you store and use the same normalization to transform search inputs, e.g., by removing all non-digits. Commented May 4 at 16:25

2 Answers 2

2
  1. Implement an algorithm/function, that simplifies strings. Following your example, it removes all non digits and non letters.
  2. Add an additional column to your table, for example named simple_name. If inserting or updating a new row, add the simplified to the new column.
  3. If searching compare (maybe with higher priority) the original entered string with the original column and additionally the simplified entered string with simple_name.
Sign up to request clarification or add additional context in comments.

1 Comment

And add an index to the additional column.
1

You can use REGEXP_REPLACE() to remove any "separating character" from the stored part number and from the input given by the number. Then you compare these two values with a WHERE condition in your SELECT statement. See the following example (using MySQL, but should work the same in MariaDB):

SELECT
    id,
    partNumber,
    REGEXP_REPLACE(partNumber, '[^a-z\\d]', '') AS filtered
FROM
    Product;

+----+-------------+-----------+
| id | partNumber  | filtered  |
+----+-------------+-----------+
|  1 | 2234A-22-43 | 2234A2243 |
|  2 | 2234A.22.43 | 2234A2243 |
|  3 | 2234A 22 43 | 2234A2243 |
|  4 | 2234A-22-99 | 2234A2299 |
|  5 | 2234A.22.99 | 2234A2299 |
|  6 | 2234A 22 99 | 2234A2299 |
+----+-------------+-----------+

The regex means that everything that is not a character or digit will be removed. Here you see the "filtered" values do not have any separated character like "space", . or - anymore. When you apply this function to the user input as well, you will get something to compare your "filtered" values against. The query might look like this:

SELECT
    id,
    partNumber
FROM
    Product
WHERE
    REGEXP_REPLACE(partNumber, '[^a-z\\d]', '') = REGEXP_REPLACE('2234A 22-43', '[^a-z\\d]', '');

+----+-------------+
| id | partNumber  |
+----+-------------+
|  1 | 2234A-22-43 |
|  2 | 2234A.22.43 |
|  3 | 2234A 22 43 |
+----+-------------+

The second REGEXP_REPLACE() function call will contain the user input and filters it. Alternatively, you can do the filtering "outside" of your database in the programming language you are using and do WHERE REGEXP_REPLACE(partNumber, '[^a-z\\d]', '') = '2234A2243' directly.

Obviously, no index will be used to search for the matching rows since the compare value is calculated on-the-fly inside the WHERE part (but you could generate and save it inside a new column filteredPartNumber and compare against that column).

1 Comment

Using this expression in WHERE results in full table scan which is slow. Hidden generated column + index by it may be more interesting solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.