1

I am trying to understand how indexing can be optimized on elasticsearch. Let me clarify my needs;

  • I have two indices rigth now. Lets say, indexA and indexB ( Two indices can be seen approximately same size)
  • I have 6 machines dedicated to elasticsearch (we can say exactly the same hardware)
  • The most important part of my elasticsearch usage is on writing since I am doing heavy writing on real time.

So my question is, how I can I optimize the writing operation using those 6 machines ?

  • Should I separate machines into two part like 3 machines for indexA and 3 machines for indexB ?

    or

  • Should I use all of 6 machines in order to index indexA and indexB ?

    and

  • What else should I need to give attention in order to optimize write operations ?

Thank you in advance

3
  • elastic.co/blog/… is your best friend :-). Commented Apr 27, 2015 at 13:12
  • And I would say that 3 machines for one index and three for the other would be better, for indexing performance, but I have no tests to confirm this nor do I believe this would be a big difference in performace. This on the idea that a thread pool is used for bulk indexing (for example) and the same pool is used for two indices (in case of placing the indices on 6 machines), instead of one index. But the best is, if you can, to test this. Commented Apr 27, 2015 at 13:16
  • Thank you @AndreiStefan for your comments, I will try to test it Commented Apr 27, 2015 at 13:23

1 Answer 1

1

It depends, but let me take to a direction as per your problem statement which led to following assumptions:

  • you want to do more write operations (not worried about search performance)
  • both the indices are in the same cluster
  • in future more systems can get added

For better indexing performance first thing is you may want to have single shard for your index (unless you are using routing). But since you have 6 servers having single shard will be waste of resources so you can assign 3 shard to each of indexA and indexB. This is for current scenario but it is recommended to have ~10 shards(for future scalibility and your data size dependent)

Turn off the replica (if possible as index requests wait for the replicas to respond before returning). Though in production environment it is highly recommended to have at least one replica for high availability.

Set refresh rate to "-1" or at least to a larger figure say "30m". (You will lose NRT search if you do so but as you have mentioned you are concerned about indexing)

Turn of index warmers if you have any.

avoid using "doc_values" for your field mapping. (though it is beneficial for reducing memory footprint during search time it will increase your index time as it prepares field values during indexing)

If possible/not required disable "norms" in your mapping

Lastly read this.

Word of caution: some of the approach above will impact your search performance.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.