2

I am trying to bulk insert data from SQL to ElasticSearch index. Below is the code I am using and total number of records is around 1.5 million. I think it something to do with connection setting but I am not able to figure it out. Can someone please help with this code or suggest better way to do it?

public void InsertReceipts
{
IEnumerable<Receipts> receipts = GetFromDB() // get receipts from SQL DB

const string index = "receipts";
var config = ConfigurationManager.AppSettings["ElasticSearchUri"];
var node = new Uri(config);

var settings = new ConnectionSettings(node).RequestTimeout(TimeSpan.FromMinutes(30));
var client = new ElasticClient(settings);

var bulkIndexer = new BulkDescriptor();

foreach (var receiptBatch in receipts.Batch(20000)) //using MoreLinq for Batch
{
    Parallel.ForEach(receiptBatch, (receipt) =>
    {
        bulkIndexer.Index<OfficeReceipt>(i => i
            .Document(receipt)
            .Id(receipt.TransactionGuid)
            .Index(index));
    });
    var response = client.Bulk(bulkIndexer);

    if (!response.IsValid)
    {
        _logger.LogError(response.ServerError.ToString());

    }

    bulkIndexer = new BulkDescriptor();
}

}

Code works fine but takes around 10 mins to complete. When I try to increase batch size, it fails with below error:

Invalid NEST response built from a unsuccessful low level call on POST: /_bulk

Invalid Bulk items: OriginalException: System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> System.IO.IOException: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host

2
  • take a look at the possible duplicate question's answer. NEST 5.x includes a helper for performing bulk requests in parallel that can help you here Commented Jun 22, 2017 at 6:54
  • 1
    Thanks @RussCam I think I agree that the root cause of my exception is also the size of data I am sending in each bulk request. I will use Observable design pattern you have mentioned in your answer with different settings to see which suits my scenario best. Commented Jun 22, 2017 at 23:34

2 Answers 2

1

A good place to start is with batches of 1,000 to 5,000 documents or, if your documents are very large, with even smaller batches.

It is often useful to keep an eye on the physical size of your bulk requests. One thousand 1KB documents is very different from one thousand 1MB documents. A good bulk size to start playing with is around 5-15MB in size.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @ali baghjery. I will try smaller batch sizes as well and check what size is each bulk request.
0

I had a similar problem. My problem was solved by adding following code, before the ElasticClient connection is established:

System.Net.ServicePointManager.Expect100Continue = false;

2 Comments

sorry, it did not fix my problem.
The client sets Expect100Continue = false by default, along with some other settings: github.com/elastic/elasticsearch-net/blob/5.x/src/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.