The Complete Guide to Increasing Your Elasticsearch Write Throughput and Speed
Multiple strategies that you can use to increase Elasticsearch write capacity for batch jobs and/or online transactions.
Over the last few years, I’ve faced bottlenecks and made many mistakes with different Elasticsearch clusters when it comes to its write capacity. Especially when one of the requirements is to write into a live Index that has strict SLAs for reading operations.
If you use Elasticsearch in production environments, chances are, you’ve faced these issues too and maybe even made some of the same mistakes I did in the past!
I think having a clear picture of the high-level overview on how Elasticsearch works underneath the covers, will help a lot when you’re trying to get the best performance out of your system, so let’s start with that.
Elasticsearch, in very simple terms, is a search engine that is powered by Lucene. Lucene powers the “search part”, while Elasticsearch gives you scalability, availability, a REST API, specialized data structures, and so on — in very broad terms.
As we can see from the diagram, Elasticsearch shards each Lucene index across the available nodes. A shard can be a primary shard or replica shard.
Each shard is a Lucene Index, each one of those indexes can have multiple segments, each segment is an inverted index.
Those segments are created during document ingestion and are immutable. That means that when you edit or delete a document that is already present in a Lucene segment, a new segment is created instead of changing the previous one.
This is a very important thing to consider. Due to this architecture, Elasticsearch does not support updates or even partial updates.
Each time you update a single field in your document, a new segment will be created and the information in the previous segment is marked as deleted.
In this diagram, we can see how a new document is stored by Elasticsearch.
As soon as it “arrives”, it is committed to a transaction log called “translog” and to a memory buffer.
The translog is how Elasticsearch can recover data that was only in memory in case of a crash.
All the documents in the memory buffer will generate a single in-memory Lucene segment when the “refresh” operation happens. This operation is used to make new documents available for search.
Depending on different triggers (more on this later), eventually, all of those segments are merged into a single segment and saved to disk and the translog is cleared.
Now that we have the basic picture of how information is saved by Elasticsearch, we’re in a better position to start with our tuning.
Like we’ve seen in a previous article on how to tune PostgreSQL for high performance, you have ways to use and configure your database engine to fit your needs (and Elasticsearch is not different).
Let’s explore the different ways we can greatly increase its write throughput and speed for writing/indexing, all the way from the client to the operating system.
Client Strategies
Bulk Indexing
ES offers a Bulk API for document indexing. As one would expect, it makes indexing much faster. It is recommended to do some benchmarks to determine the right batch size for your data.
Parallelization
ES scales horizontally by nature — and so should your indexing jobs. Make sure you distribute the data you need to ingest across multiple workers that can run in parallel. Pay attention to errors like TOO_MANY_REQUESTS (429)
that happens when Elasticsearch thread pool queues are full (and make sure you have exponential backoff when that happens).
Response Filtering
filter_path
parameter that can be used to reduce the response returned by Elasticsearch. This parameter takes a comma-separated list of filters expressed with the dot notation:
POST "es-endpoint/index-name/_bulk?filter_path=-took,-items.index._index,-items.index._type"
Aggregate Before Saving
This is especially important if you’re updating your documents through “random” operations (like a user updating their username) instead of running batch jobs that already have aggregated information.
Let's say you use Elasticsearch for full-text search on your social network project and you need to update your user document each time a user gets a new follower.
You will soon start seeing multiple timeout issues and lag across the board.
You may be just updating a simple field, but Elasticsearch is forced to delete the current document and create an entirely new one!
A simple solution here is to “buffer” those user changes using Redis, Kafka, or any other means and then aggregate those multiple document edits into a single one before saving to Elasticsearch. Better yet, group by “key” for a window of time and then use bulk operations to send those docs to ES.
Document Strategies
Scripting
It can be tempting to use some tricks like these:
"script" : {
"inline": "ctx._source.counter = ctx._source.value ? ctx._source.counter += 1 : 1"
}
Although at first glance it might be a good idea to use scripting to automatically increment fields during updates or any other logic, it does not scale well and you could even compromise your entire cluster with strategies like these. I must confess I made this mistake a few years ago…
Especially since even if you set a query timeout, there is no guarantee of it being enforced server-side like described in this github issue.
Elastic has a great article about general document rules of thumb, read it here.
Index Strategies
Refresh Interval
This is probably one of the configurations at the index level that will make the most difference.
The refresh operation makes new data available for search. When a refresh happens, the recently indexed data that only exists in memory is transferred to a new Lucene segment. This operation does not flush/commit those segments to disk but the data is safe thanks to the translog, more on this later.
Elasticsearch will execute a refresh every second if the index received one search request or more in the last 30 seconds.
If you’re using Elasticsearch, your system is probably already prepared for eventual consistency, and increasing the refresh interval could be a good option for you.
You can set a bigger refresh_interval
for your index during a batch ingest job and then set it back again to your standard value after the job finishes.
If you have a lot of different concurrent jobs happening, it could try to understand how big of a refresh interval you can live with.
More info here.
Auto-Generated IDs
Allowing Elasticsearch to generate ids on your behalf should increase document creation speed since Elasticsearch won’t need to check for uniqueness.
The performance increase of using auto-generated ids was way more evident before version 1.4. Since then, Elasticsearch id lookup speed was greatly increased, reducing the penalty when your using self-ids.
But that is only true if you use a “Lucene friendly” format. A great article about good ids can be found here.
Examples include zero-padded sequential IDs, UUID-1, and nanotime; these IDs have consistent, sequential patterns that compress well. In contrast, IDs such as UUID-4 are essentially random, which offer poor compression and slow down Lucene.
There is also the option of using a “normal” field to store your “self-id”, but bear in mind that you will lose some advantages like:
- fast routing, i.e. ES knows what shard contains that document
- uniqueness guarantee
- delete by id
Disable Replicas
If you can disable your replicas during a bulk index job, it will also greatly increase indexing speed.
When you index a document, it will be indexed by each replica, effectively multiplying the work by the number of replicas you have.
On the other hand, if you disable them before the bulk job and only enable them afterward, the new information will be replicated in a serialized binary format without the need to analyze or merge segments.
Node Strategies
Indexing Buffer Size
This setting will control how much memory Elasticsearch will reserve for indexing. The default is 10% of the heap.
Since this will be shared among all the indexes of your node, you need to make sure you have at least 512MB allocated per index.
If you increase your translog size, you’ll also need to increase these settings to avoid flushing prematurely.
More info about those settings here.
Translog
Translog is the Elasticsearch transaction log. You have 1 per shard and it is updated alongside the in-memory buffer and in case of a crash, it can be used to persist data to disk in Lucene.
When it reaches a certain size (512MB is the default), it will trigger a flush operation that will commit all in-memory Lucene segments to disk.
This flush operation takes the form of a Lucene commit. During a commit, all the segments in memory are merged into a single segment and saved to disk.
As you can imagine, this is an expensive operation. You can increase the translog to create bigger segments in disk (better performance overall) while also reducing this expensive operation cadence.
The field that is responsible for that value is flush_threshold_size
. Read more about it here.
Segment Merge Throttling (deprecated with ES 6)
You will see on multiples articles misleading information about this setting, so I decided to make a special note here:
From Elasticsearch 2, the often mentioned default value of 20MB/s throttle that happens when merges start falling behind is no longer used.
Instead, Elasticsearch now uses Lucene’s adaptive merge IO throttling that changes IO limits according to the current load and server performance.
With Elasticsearch 6, settings like indices.store.throttle.max_bytes_per_sec
andindices.store.throttle.type
no longer exist as you can see here.
Cross-Cluster Replication
With this setup, you can direct all your reads to the “follow” index and write only to the “leader” index. This way you don’t have reads competing with writes.
Operating System and Server Strategies
Disable Swapping
Swapping will kill your performance. Elasticsearch uses a lot of memory so if you’re not careful, there’s a good chance it might start swapping.
You have multiple ways of disabling swap.
Filesystem Cache
Linux automatically uses free ram to cache files. Elastic recommends having at least half the memory of the machine running Elasticsearch available for the filesystem cache.
Make sure you don’t set ES_HEAP_SIZE to be more than 50% of your machine memory so the rest is free for the filesystem cache.
You should also avoid having more than 32GB of HEAP since it will start using uncompressed pointers that will hinder performance and use double the memory.
Storage Type
Avoid network disks like NFS or EFS and try to always use SSD (NVMe if possible).
Hardware like AWS EBS can be a good option but directly connected disks will always be faster, especially if you have a RAID setup.
How does this all sound? Is there anything I missed or that you’d like me to expand on? Let me know your thoughts in the comments section below (and hit the clap if this was useful)!
Stay tuned for the next post. Follow so you won’t miss it!
Further Reading
References
https://aws.amazon.com/premiumsupport/knowledge-center/elasticsearch-indexing-performance/
https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
https://qbox.io/blog/optimizing-elasticsearch-how-many-shards-per-index
https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html
https://medium.com/@alikzlda/elasticsearch-cluster-sizing-and-performance-tuning-42c7dd54de3c
https://qbox.io/blog/maximize-guide-elasticsearch-indexing-performance-part-1
https://qbox.io/blog/maximize-guide-elasticsearch-indexing-performance-part-2/
https://qbox.io/blog/maximize-guide-elasticsearch-indexing-performance-part-3
https://www.datadoghq.com/blog/elasticsearch-performance-scaling-problems/
https://logz.io/blog/elasticsearch-performance-tuning/
https://tech.ebayinc.com/engineering/elasticsearch-performance-tuning-practice-at-ebay/
https://blog.opstree.com/2019/10/01/tuning-of-elasticsearch-cluster/
https://www.outcoldman.com/en/archive/2017/07/13/elasticsearch-explaining-merge-settings/
https://www.elastic.co/blog/elasticsearch-performance-indexing-2-0
http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html