2016-07-27 16:10:22,840 ERROR org.ctrip.ops.sysdev.outputs.Elasticsearch$1 pool-2-thread-2 RemoteTransportException[[a08.elastic.loganalyse.monitor.b28.youku][inet[/10.103.11.27:9300]][indices:data/write/ bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@2b16cb10]; 2016-07-27 16:10:22,840 INFO org.ctrip.ops.sysdev.outputs.Elasticsearch$1 pool-2-thread-2 2104 doc failed, 2104 need to retry 2016-07-27 16:10:22,840 INFO org.ctrip.ops.sysdev.outputs.Elasticsearch$1 pool-2-thread-2 sleep 1052millseconds after bulk failure
Threadpool Properties Prevent Data Loss ElasticSearch node has several thread pools in order to improve how threads are managed within a node. At Loggly, we use bulk request extensively, and we have found that setting the right value for bulk thread pool using threadpool.bulk.queue_size property is crucial in order to avoid data loss or _bulk retries
This property value is for the bulk request. This tells ES the number of requests that can be queued for execution in the node when there is no thread available to execute a bulk request. This value should be set according to your bulk request load. If your bulk request number goes higher than queue size, you will get a RemoteTransportException as shown below.
Note that in ES the bulk requests queue contains one item per shard, so this number needs to be higher than the number of concurrent bulk requests you want to send if those request contain data for many shards. For example, a single bulk request may contain data for 10 shards, so even if you only send one bulk request, you must have a queue size of at least 10. Setting this value “too high” will chew up heap in your JVM, but does let you hand off queuing to ES, which simplifies your clients.
You either need to keep the property value higher than your accepted load or gracefully handle RemoteTransportException in your client code. If you don’t handle the exception, you will end up losing data. We simulated the exception shown below by sending more than 10 bulk requests with a queue size of 10.
RemoteTransportException[[<Bantam>][inet[/192.168.76.1:9300]][bulk/shard]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 10) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@13fe9be];