How to implement a basic Analyzer in Lucene 4.2.1?

Question

Lucene 4.2.1 doesnot have StandardAnalyzer, and I am not sure how to implement a basic analyzer that does not alter the source text. Any pointers?

final SimpleFSDirectory DIRECTORY = new SimpleFSDirectory(new File(ELEMENTS_INDEX_DIR));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_42, new Analyzer() {
        @Override
        protected TokenStreamComponents createComponents(String s, Reader reader) {
            return null;
        }
    });
    IndexWriter indexWriter = new IndexWriter(DIRECTORY, indexWriterConfig);
    List<ModelObject> elements = dao.getAll();
    for (ModelObject element : elements) {
        Document document = new Document();
        document.add(new StringField("id", String.valueOf(element.getId()), Field.Store.YES));
        document.add(new TextField("name", element.getName(), Field.Store.YES));
        indexWriter.addDocument(document);
    }
    indexWriter.close();

I am also really confused.. all I need is standard analyzer. even the demo in the 4.2.1 source uses StandardAnalyzer but would not compile (cause it's looking for it in org.apache.lucene.analysis.standard.StandardAnalyzer which does not exist anymore) — Dorian
– Dorian, Commented May 2, 2013 at 8:21

femtoRgon · Accepted Answer · 2013-04-23 15:12:09Z

9

You have to return a TokenStreamComponents from createComponents. null is not adequate.

However, Lucene 4.2.1 certainly does have StandardAnalyzer.

If you are, perhaps, refering to the changes in StandardAnalyzer in Lucene 4.x, and are looking for the old StandardAnalyzer, then you'll want ClassicAnalyzer.

If you really want a trimmed down Analyzer that doesn't modify anything, but just tokenizes in a very simple fashion, perhaps WhitespaceAnalyzer will serve your purposes.

If ou don't want it modified or tokenized at all, then KeywordAnalyzer.

And if you must create your very own Analyzer, as you say, then override the method createComponents, and actually build and return an instance of TokenStreamComponents. If none of the above four serve your needs, I have no idea what your needs are, and so I won't make an attempt a specific example here, but here is the example from the Analyzer docs

Analyzer analyzer = new Analyzer() {
 @Override
  protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer source = new FooTokenizer(reader);
    TokenStream filter = new FooFilter(source);
    filter = new BarFilter(filter);
    return new TokenStreamComponents(source, filter);
  }
};

There is a single arg ctor for TokenStreamComponents as well, so the filter is optional, by the way.

edited Apr 23, 2013 at 15:12

answered Apr 22, 2013 at 22:40

femtoRgon

33.4k7 gold badges67 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dorian Over a year ago

This did not solve the problem and StandardAnalyzer doesn't seem to be usable. How do I instantiate it? Can you post a sample code file that uses analyzer and works with Lucene 4.2.1?

Marsellus Wallace Over a year ago

"If none of the above four serve your needs, I have no idea what your needs are" +1!

Dorian · Accepted Answer · 2013-05-02 08:38:53Z

2

You should add the Common Analyzers to your project. They are now available in a separate JAR file in the Lucene-4.2.1.zip file under "analysis/common".

 lucene-analyzers-common-4.*.jar

Once you add it to your project (as you added the core) you should have this working:

import org.apache.lucene.analysis.standard.StandardAnalyzer;

answered May 2, 2013 at 8:38

Dorian

1,08811 silver badges19 bronze badges

Collectives™ on Stack Overflow

How to implement a basic Analyzer in Lucene 4.2.1?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related