By default, the default_language for text indexes is english.
To improve the performance of non-English text search queries, you can specify
a different default language associated with your text index.
The default language associated with the indexed data determines the suffix
stemming rules. The default language also determines which language-specific
stop words (for example, the, an, a, and and in English) are
not indexed.
To specify a different language, use the default_language option when
creating the text index. To see the languages available for text indexing, see
Text Search Languages on Self-Managed Deployments. Your operation should resemble this prototype:
db.<collection>.createIndex( { <field>: "text" }, { default_language: <language> } )
If you specify a default_language value of none, the text index
parses through each word in the field, including stop words, and ignores
suffix stemming.
Before You Begin
Create a quotes collection that contains the following documents
with a Spanish text field:
db.quotes.insertMany( [ { _id: 1, quote : "La suerte protege a los audaces." }, { _id: 2, quote: "Nada hay más surrealista que la realidad." }, { _id: 3, quote: "Es este un puñal que veo delante de mí?" }, { _id: 4, quote: "Nunca dejes que la realidad te estropee una buena historia." } ] )
Procedure
The following operation creates a text index on the quote field and sets
the default_language to spanish:
db.quotes.createIndex( { quote: "text" }, { default_language: "spanish" } )
Results
The resulting index supports text search queries on the quote field with
Spanish-language suffix stemming rules. For example, the following
query searches for the keyword punal in the quote field:
db.quotes.find( { $text: { $search: "punal" } } )
Output:
[ { _id: 3, quote: "Es este un puñal que veo delante de mí?" } ]
Although the $search value is set to punal, the query will return the
document containing the word puñal because text indexes are diacritic
insensitive.
The index also ignores language-specific stop words. For example, although the
document with _id: 2 contains the word hay, the following query does not
return any documents. hay is classified as a Spanish stop word, meaning it
is not included in the text index.
db.quotes.find( { $text: { $search: "hay" } } )
Learn More
To create a text index for a collection containing text in multiple languages, see Create a Multi-Language Text Index on Self-Managed Deployments.
To learn about other text index properties, see Text Index Properties on Self-Managed Deployments.