In GraphDB 10.6 I need to search across English and French words ignoring accents. I am looking for ASCII folding.
I have tried this SPARQL to generate the Lucene connector, but I get 500: Error - Unable to create connector: Unable to init Lucene index.
PREFIX luc: <http://www.ontotext.com/connectors/lucene#>
PREFIX luc-index: <http://www.ontotext.com/connectors/lucene/instance#>
INSERT DATA {
luc-index:myindex luc:createConnector
'''
{
"fields": [
{
"fieldName": "label",
"fieldNameTransform": "predicate.localName",
"propertyChain": ["$literal"],
"ignoreInvalidValues": true
}
],
"languages": [],
"types": ["$any"],
"analyzer": {
"tokenizer": "org.apache.lucene.analysis.standard.StandardTokenizerFactory",
"filters": [
"org.apache.lucene.analysis.standard.StandardFilterFactory",
"org.apache.lucene.analysis.lowercase.LowerCaseFilterFactory",
"org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilterFactory"
]
}
}
''' .
}
I cannot find documentation on selecting among standard analyzers other then this Graphdb page https://graphdb.ontotext.com/documentation/10.6/lucene-graphdb-connector.html
How can I set the analyzer to one of the existing choices? Or must I create a custom one. I feel like these must be an existing one that includes ASCII folding!