Scaling Elasticsearch

Elasticsearch, “you know for search”, is a well known search engine with RESTFUL HTTP interface. It’s fast, scalable, and easy to use. Elasticsearch is built to be always available and scale with our needs.

It’s hard to design the perfect elasticsearch setup for the first time. People usually start with the most simple solution that can fit their needs, that’s normal. As your documents and traffic grow, you may need to scale your elasticsearch, especially if you have a performance issue. Here are some tips that you can apply to scale your elasticsearch


Image from

Documents in an index can grow to massive proportions. In order to keep it manageable, We can split an index into a number of shards. A shard contains a subset of an index data and it’s fully functional and independent.

Let’s say you have an index with a single shard with the size of 1TB and 4 nodes with disk size of 300 GB. You won’t be able to put that index on any of the nodes since the size of the index is bigger than the node itself. You can split the index into 4 shards, so each shard will contain around 256 GB.

Having extra shards can be helpful when you want to add more nodes. For example, you have 3 shards in 1 node, you can add 2 more nodes and each node will contain 1 shard, but if you need more than 3 nodes, you won’t be able to do that because you have more nodes than shards.

Now you must be thinking of creating as many shards as possible. This is not right. there’s a hidden cost to each shard Elastic-search has to manage. Each shard is a complete Lucene index and it requires a number of file descriptors for each segment of the index, as well as a memory overhead.

There is no perfect number for all use cases. Elastic search suggested 5 shards for the general use cases.


Image from

There are two types of shards, the primary shard and replica. A replica is a copy of a primary shard. If you have multiple nodes, ES will allocate the primary and replica on a different node, which ensures access to your data in the event of a node failure.

Replica can be used to serve search requests, so adding replicas on the new nodes can be a quick solution to handle extra traffic in search. While replicas can help the performance of searching, it has trade off on the writing / indexing. Indexing requests won’t complete until the data exists on the primary shard as well as all replicas, so please be mindful of adding replicas for write heavy applications.


Image from

When you have multiple shards, your data will be distributed across the shards. Elasticsearch has no idea where to search for your document. All the docs were randomly distributed across the shards. Elasticsearch has no choice but to broadcast the request to all of the shards. This process can be unnecessary and potentially impacting the performance.

Routing is the process of determining which shard that document will reside in. You can determine the routing based on the routing key. You can provide the routing key for indexing, and you can use the same routing key again for searching. It will look only to one shard instead of blindly broadcasting to all shard

$ curl -XPOST ‘http://localhost:9200/store/order?routing=user123' -d ‘{“productName”:”sample”,“customerID”:”user123"}

Custom routing can save the load on the nodes and increase the performance noticeably. You can have a lot of shards for your index but only the shards holding your data are queried.

Defining the good routing key is very important. With a bad routing key, you can have unbalanced sizes or traffic to your shards. For example, you have documents of items with the city as the routing key. Jakarta is the city where you have the traffic and the items most. This will cause the shard for Jakarta to be the biggest and highest throughout. Having the city as the routing key is inefficient in this case. Please be mindful of deciding the routing key.

Filesystem cache

Elasticsearch relies on filesystem cache for caching the I/O operations in order to make the search fast. You should make sure to spare half of available memory that goes to filesystem cache so that Elasticsearch can keep hot regions of the index in physical memory.

Check your query

Bulk Request

The above tips mostly focus on optimizing search. Now, the indexing can be heavy. Bulk Requests performs multiple index, create, update or delete in a single API call. This can increase the speed of indexing significantly.

POST _bulk{ “index” : { “_index” : “test”, “_id” : “1” } }{ “field1” : “value1” }{ “delete” : { “_index” : “test”, “_id” : “2” } }{ “create” : { “_index” : “test”, “_id” : “3” } }{ “field1” : “value3” }{ “update” : {“_id” : “1”, “_index” : “test”} }{ “doc” : {“field2” : “value2”} }

Increase refresh interval

Refresh is an operation to make the changes of the documents visible to search.Elasticsearch refreshes the indices every second by default. It refreshes only the indices that have search request in the last 30 seconds. This process is costly and can impact the indexing speed. If you want to optimize the indexing, it’s suggested to increase index.refresh_interval to a larger value.

Disable replicas for temporary

As mentioned above, replicas can help the searching with the traffic but not indexing. Indexing requests won’t complete until the data exists on the primary shard as well as all replicas. When you have a surge on indexing request, you may want to disable the replica for sometime if there is performance issue with the indexing

Auto generated ID

During the indexing, elasticsearch needs to check whether the ID already exists on the same shard. This process can be skipped by using auto generated ID and it can make the indexing process faster.

Software Engineer | Architect | Manager | A father of two

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store