I'm using lucene 6.6.0 to develop a search service and i'm quite confused on how to create custom analyzers and queries.
I have written my index based on data from a rdbms and at first i was just using a standard analyzer. Unfortunately it does not seem to split text by special characters like "_","-" or numbers, it only tokenizes by whitespace. I have found the WordDelimiterGraphFilter, which seems to do what i want, but i do not understand to make it work. Right now i try to use it like this:
mCustomAnalyzer = new Analyzer()
{
@Override
protected TokenStreamComponents createComponents(String fieldName) {
Tokenizer source = new StandardTokenizer();
TokenStream filter = new WordDelimiterGraphFilter(source, 8, null);
return new TokenStreamComponents(source, filter);
}
};
QueryBuilder queryBuilder = new QueryBuilder(mCustomAnalyzer);
Query query = queryBuilder.createPhraseQuery(aField, aText, 15);
For indexing i am using the same Analyzer. However it does not work: If i search for "term1 term2" i expect to find things like "term1_term2" and also "term32423" or "term_232".
What am i missing here? I tried different integers as "configurationFlag" argument for the filter [1], but it doesn't seem to work...