0

For the application we are developing we need to allow our searches to support accents, be case insensitive and search for partial words. For example, given the product name "La Niña" in our collection, the following searches should be expected to return the entry:

  • La Niña
  • niña
  • nina
  • nin
  • La nin

Currently I have tried two approaches, each with their appear apparent limitations, based on testing and some research:

  • Regex

    • supports case insensitive and partial searches
    • does not support accents such that, niña != nina
  • Text Search

    • support case insensitive, accents and partial phrases
    • does not support partial words

Example regex search, as we have used:

function escapeRegExp(text) {
  return text.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

const escapedStr = this.escapeRegExp(searchTerm);
await Product.find({ name: new RegExp(`${escapedStr}`, 'i') });

Example text search, as we have used:

// On the schema
storeSchema.index({ name: 'text' });

// Searching:
await Product.find($text: { $search: searchTerm })
  .collation({locale: 'en', strength: 1});

BTW We have set the schemas in question to use collation strength level 1.

Some approaches I am considering, if MongoDB doesn't provide a solution:

  • shadow name field (not sure the right term?), with the accents removed
  • a separate full text search engine

Can anyone help here?

Note, we are leveraging mongoose 5.9.5, with node 12.16.2 and mongodb 4.3.8 running in mongo cloud.

1 Answer 1

0

I believe the Text Search is what you need. There are two other features of Text Search that fulfills the requirement of a partial word match you described in the question.

  • Stop Words: Given a language option, MongoDB Text Search is capable of identifying words that shouldn't influence search results. The frequency of usage of these words is such that they appear in almost every sentence, for example, in English, words like "the", "a", "of", are all stop words. These words are stripped off the search phrase before the actual search takes place.

  • Word Stemming: Given a language option, MongoDB Text Search is capable of identifying the root version of a word, for example, in English, the stem version of "identifying" would be "identify" so they both would match in a text search".

I was able to figure with Google Translate that the "La Niña" example you gave is in Spanish.

If I insert the following into a sample product collection:

db.products.insertMany([
  { "term" : "La Niña" },
  { "term" : "niña" },
  { "term" : "nina" },
  { "term" : "nin" },
  { "term" : "La nin" },
]) 

By specifying a language option of "spanish" on my Test Search query:

db.products.find({ $text: { $search: "La Niña", $language: "spanish" } })

MongoDB would effectively match that with all the products that were previously inserted. You can get a list of the supported language options for MongoDB here.

I'm not 100% sure of how the accent matching works though.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.