1

I have a 64 bit server, 8 GB RAM, dual quad CPU. No resources are ever hitting 100% (except, I guess, the JVM -- right?).

I need to index several million records for Solr, but the machine is in production. I recognize having a second machine for indexing would be helpful.

Should I dedicate a second instance of the JVM, dedicated to Solr?

Right now, when I run an index, pages which are normally served in 200 milliseconds will serve up in about 1.5 seconds, sometimes more... hitting, even, the dreaded "Service is Unavailable" error.

I adjusted my JVM Heap as follows:

-Xmx1024m
-XX:MaxPermSize256m

In case I'm chasing the wrong solution, allow me to broaden the landscape a bit. It seems that I can't affect the indexing speed of Solr. I had previously been indexing about 150,000 records per hour on a dev server virtualized on a workstation. In a production environment with much more hardware available, I'm indexing at the exact same speed.

Without data to prove it, I think that my JVM adjustments did not speed up the indexing, although it may have allowed the CF server to continue serving pages. I must say, the indexing speed is terribly slow, but I do know that it's not a function of the data access layer. I rewrote it from pure ORM to objects backed by SQL Stored Procedures thinking that was the slowdown (no effect).

3
  • Clarification: the index process is HTTP Post (not embedded). Commented Oct 13, 2010 at 17:10
  • you shouldn't be indexing a lot of data frequently right? You can use action=update Commented Oct 13, 2010 at 22:44
  • Correct, Henry, it will probably be a monthly batch (new data imports, some data updates). Commented Oct 14, 2010 at 15:06

2 Answers 2

3

use a separate instance for indexing the index, the only trick is getting the running searching instance to re-read the updated index, in which case, you set up a master (the indexer) and slave(the searcher) and do replication. this will both make the searcher not get interrupted, and the indexer will utilize its own JVM including its own share of the resources.

Sign up to request clarification or add additional context in comments.

1 Comment

Excellent. I figured I couldn't just have a JVM for indexing AND have search work. You gave me the missing link. Thanks!
1

Have you tried these optimization tips?

http://bloggeraroundthecorner.blogspot.com/2009/08/tuning-coldfusion-solr-part-1.html

http://bloggeraroundthecorner.blogspot.com/2009/08/tuning-coldfusion-solr-part-2.html

http://bytestopshere.com/post.cfm/lessons-learned-moving-from-verity-to-solr-part-1

1 Comment

I see by my visited link colors that I've been on the first one, but not the second two. I'll check those out. Thanks! My hope is that I can invest a couple hours to finding a good trick that will return many hours of decreased indexing time. Thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.