Search Queries in MariaDB escaping - , spaces and

Question

I have a MariaDB database with a huge list of part numbers. But I need to be able to make search queries for the product where users might not use the right seperating characters. For example the part number on my system could be something like 2234A-22-43 But people might search for it as '2234A2243' or '2234A 22 43' or '2234A.22.43' How do I make sure I catch each search variant and still get the same product? Thank you in advance

I tried creating a seperate column for alternate part numbers and adding all the variants, but that seems like a not so great solution. I wanted to find a way to solve this using the query itself.

You could use LIKE, or you could normalize what you store and use the same normalization to transform search inputs, e.g., by removing all non-digits. — Robert
– Robert, Commented May 4 at 16:25

Wiimm · Accepted Answer · 2025-05-04 16:25:18Z

2

Implement an algorithm/function, that simplifies strings. Following your example, it removes all non digits and non letters.
Add an additional column to your table, for example named simple_name. If inserting or updating a new row, add the simplified to the new column.
If searching compare (maybe with higher priority) the original entered string with the original column and additionally the simplified entered string with simple_name.

answered May 4 at 16:25

Wiimm

3,7531 gold badge20 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rick James May 6 at 0:59

And add an index to the additional column.

Progman · Accepted Answer · 2025-05-04 16:59:10Z

You can use REGEXP_REPLACE() to remove any "separating character" from the stored part number and from the input given by the number. Then you compare these two values with a WHERE condition in your SELECT statement. See the following example (using MySQL, but should work the same in MariaDB):

SELECT
    id,
    partNumber,
    REGEXP_REPLACE(partNumber, '[^a-z\\d]', '') AS filtered
FROM
    Product;

+----+-------------+-----------+
| id | partNumber  | filtered  |
+----+-------------+-----------+
|  1 | 2234A-22-43 | 2234A2243 |
|  2 | 2234A.22.43 | 2234A2243 |
|  3 | 2234A 22 43 | 2234A2243 |
|  4 | 2234A-22-99 | 2234A2299 |
|  5 | 2234A.22.99 | 2234A2299 |
|  6 | 2234A 22 99 | 2234A2299 |
+----+-------------+-----------+

The regex means that everything that is not a character or digit will be removed. Here you see the "filtered" values do not have any separated character like "space", . or - anymore. When you apply this function to the user input as well, you will get something to compare your "filtered" values against. The query might look like this:

SELECT
    id,
    partNumber
FROM
    Product
WHERE
    REGEXP_REPLACE(partNumber, '[^a-z\\d]', '') = REGEXP_REPLACE('2234A 22-43', '[^a-z\\d]', '');

+----+-------------+
| id | partNumber  |
+----+-------------+
|  1 | 2234A-22-43 |
|  2 | 2234A.22.43 |
|  3 | 2234A 22 43 |
+----+-------------+

The second REGEXP_REPLACE() function call will contain the user input and filters it. Alternatively, you can do the filtering "outside" of your database in the programming language you are using and do WHERE REGEXP_REPLACE(partNumber, '[^a-z\\d]', '') = '2234A2243' directly.

Obviously, no index will be used to search for the matching rows since the compare value is calculated on-the-fly inside the WHERE part (but you could generate and save it inside a new column filteredPartNumber and compare against that column).

Using this expression in WHERE results in full table scan which is slow. Hidden generated column + index by it may be more interesting solution.

Collectives™ on Stack Overflow

Search Queries in MariaDB escaping - , spaces and

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related