3

Lucene 4.2.1 doesnot have StandardAnalyzer, and I am not sure how to implement a basic analyzer that does not alter the source text. Any pointers?

final SimpleFSDirectory DIRECTORY = new SimpleFSDirectory(new File(ELEMENTS_INDEX_DIR));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_42, new Analyzer() {
        @Override
        protected TokenStreamComponents createComponents(String s, Reader reader) {
            return null;
        }
    });
    IndexWriter indexWriter = new IndexWriter(DIRECTORY, indexWriterConfig);
    List<ModelObject> elements = dao.getAll();
    for (ModelObject element : elements) {
        Document document = new Document();
        document.add(new StringField("id", String.valueOf(element.getId()), Field.Store.YES));
        document.add(new TextField("name", element.getName(), Field.Store.YES));
        indexWriter.addDocument(document);
    }
    indexWriter.close();
1
  • I am also really confused.. all I need is standard analyzer. even the demo in the 4.2.1 source uses StandardAnalyzer but would not compile (cause it's looking for it in org.apache.lucene.analysis.standard.StandardAnalyzer which does not exist anymore) Commented May 2, 2013 at 8:21

2 Answers 2

9

You have to return a TokenStreamComponents from createComponents. null is not adequate.

However, Lucene 4.2.1 certainly does have StandardAnalyzer.

If you are, perhaps, refering to the changes in StandardAnalyzer in Lucene 4.x, and are looking for the old StandardAnalyzer, then you'll want ClassicAnalyzer.

If you really want a trimmed down Analyzer that doesn't modify anything, but just tokenizes in a very simple fashion, perhaps WhitespaceAnalyzer will serve your purposes.

If ou don't want it modified or tokenized at all, then KeywordAnalyzer.

And if you must create your very own Analyzer, as you say, then override the method createComponents, and actually build and return an instance of TokenStreamComponents. If none of the above four serve your needs, I have no idea what your needs are, and so I won't make an attempt a specific example here, but here is the example from the Analyzer docs

Analyzer analyzer = new Analyzer() {
 @Override
  protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer source = new FooTokenizer(reader);
    TokenStream filter = new FooFilter(source);
    filter = new BarFilter(filter);
    return new TokenStreamComponents(source, filter);
  }
};

There is a single arg ctor for TokenStreamComponents as well, so the filter is optional, by the way.

Sign up to request clarification or add additional context in comments.

2 Comments

This did not solve the problem and StandardAnalyzer doesn't seem to be usable. How do I instantiate it? Can you post a sample code file that uses analyzer and works with Lucene 4.2.1?
"If none of the above four serve your needs, I have no idea what your needs are" +1!
2

You should add the Common Analyzers to your project. They are now available in a separate JAR file in the Lucene-4.2.1.zip file under "analysis/common".

 lucene-analyzers-common-4.*.jar

Once you add it to your project (as you added the core) you should have this working:

import org.apache.lucene.analysis.standard.StandardAnalyzer;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.