I'm trying to implement a robust search function in my NestJS/Mongoose application that can handle partial matches while being case-insensitive and diacritics-insensitive (ignoring accents).
My current aggregation setup uses an $or block combining $text (for diacritic insensitivity on whole words) and $regex with $options: 'I' (for partial and case insensitivity). However, the $regex method remains sensitive to diacritics, causing my combined-feature test to fail.
My function:
public async getCustomers(query?: GetCustomersQueryParamsDto) {
const pipeline: PipelineStage[] = []
if (query?.search) pipeline.push(this._searchCustomersStage(query.search))
const customers = await this.customer.aggregate<CustomerWithBillsStats>(pipeline)
return customers
}
private _searchCustomersStage(str: string): PipelineStage {
return {
$match: {
$or: [
{ $text: { $search: str } }, // for full-text search (diacritics insensitive but exact match)
{ name: { $regex: str, $options: 'i' } }, // for partial match (can partially match but not diacritics insensitive)
],
},
}
}
The problem is, when searching for the partial string 'elo' against a name like 'Élodie', the following happens in my _searchCustomersStage:
{ $text: { $search: 'elo' } }: Fails, as$textperforms token/whole-word matching, not substring matching.{ name: { $regex: 'elo', $options: 'i' } }: Fails, as PCRE regex with the i option is diacritics-sensitive and treats e and É as different characters.
exemple of test that fails
it('should search based on name and be diacritics(accents) insensitive, partial match and case insensitive', async () => {
await customerModel.insertMany([
{ ...generateCustomer(), name: 'Élodie' }, // match: contains "elo"
{ ...generateCustomer(), name: 'Brandon' }, // no match
{ ...generateCustomer(), name: 'Daniel' }, // no match
])
const { customers } = await customerService.getCustomers({ search: 'elo' })
expect(customers).toHaveLength(1)
})
the fails:
● CustomerService › getCustomers › search › should search based on name and be diacritics(accents) insensitive, partial match and case insensitive
expect(received).toHaveLength(expected)
Expected length: 1
Received length: 0
Received array: []
398 |
399 | const { customers } = await customerService.getCustomers({ search: 'elo' })
> 400 | expect(customers).toHaveLength(1)
| ^
401 | })
402 |
403 | it('should search based on shortName when exact match', async () => {
at Object.<anonymous> (test/customer/integration/customer.service.spec.ts:400:27)
What is the most effective and performant way to modify the MongoDB aggregation pipeline to make the partial search diacritics-insensitive and while retaining case-insensitivity?
P.S:
I configured my indexes like this
@Module({
imports: [
MongooseModule.forFeatureAsync([
{
name: Customer.name,
useFactory: () => {
const schema = CustomerSchema
schema.index({ name: 1 }, { unique: true }) // index for search and ensure that `name` is unique between customers
schema.index({ name: 'text' }) // index for full-text search (diacritics-insensitive but must be exact match)
schema.index({ name: 1, _id: 1 }) // compound index for pagination sorting
return schema
},
},
]),
],
providers: [CustomerService],
controllers: [CustomerController],
exports: [MongooseModule, CustomerService],
})
export class CustomerModule {}
This some tests case that succeed
it('should search based on name and return customers when exact match', async () => {
await customerModel.insertMany([
{ ...generateCustomer(), name: 'Alpha' },
{ ...generateCustomer(), name: 'Bravo' },
{ ...generateCustomer(), name: 'Charlie' },
])
const { customers } = await customerService.getCustomers({ search: 'Charlie' })
expect(customers).toHaveLength(1)
})
it('should search based on name and return customers when partial match', async () => {
await customerModel.insertMany([
{ ...generateCustomer(), name: 'Eleanor' }, // no match
{ ...generateCustomer(), name: 'Marcelo' }, // match: contains "elo"
{ ...generateCustomer(), name: 'Brandon' }, // no match
{ ...generateCustomer(), name: 'Elohim' }, // match: contains "elo"
{ ...generateCustomer(), name: 'Daniel' }, // no match
])
const { customers } = await customerService.getCustomers({ search: 'elo' })
expect(customers).toHaveLength(2)
})
it('should search based on name and be case insensitive and return found customers', async () => {
await customerModel.insertMany([
{ ...generateCustomer(), name: 'Alpha' },
{ ...generateCustomer(), name: 'Bravo' },
{ ...generateCustomer(), name: 'Charlie' },
])
const { customers } = await customerService.getCustomers({ search: 'cHarLiE' })
expect(customers).toHaveLength(1)
})
it('should search based on name and be diacritics(accents) insensitive', async () => {
await customerModel.insertMany([
{ ...generateCustomer(), name: 'Élodie' }, // match: contains "elo"
{ ...generateCustomer(), name: 'Brandon' }, // no match
{ ...generateCustomer(), name: 'Daniel' }, // no match
])
const r1 = await customerService.getCustomers({ search: 'Elodie' })
expect(r1.customers).toHaveLength(1)
const r2 = await customerService.getCustomers({ search: 'elodie' })
expect(r2.customers).toHaveLength(1)
})
$regex,$regexwill ignore collation and collation only support exact match with$match.