Using Event Sourcing to Increase Elasticsearch Performance

A constant flow of document updates can bring an Elasticsearch cluster to its knees. Fortunately, there are ways to avoid that scenario.

As we’ve seen in my previous article, Elasticseach doesn’t really support updates. In Elasticsearch, an update always means delete+create.

In a previous project, we were using Elasticsearch for full-text search and needed to save some signals, like new followers, along with the user document.

That represented a big issue since thousands of new signals for a single user could be generated in seconds and that meant thousands of sequential updates to the same document.

Going for the naive solution of just issuing those updates is a good way to set an Elasticsearch cluster on fire :)

We had tolerance for eventual consistency for this system so we could live with delaying those signals from arriving into Elasticsearch.

With that flexibility, the final solution consisted of converting all changes that needed to be saved to events, using the Event Sourcing design pattern. The application state for this case would be stored in Elasticsearch.

The second part of the solution is how we save that state.

We’ll need to group all changes to each document into one single operation. This means having a list of values in some cases, and only keeping the last change for other cases (like username change).
To make sure that state is consistent, we need to guarantee the order for events that target the same document or we risk having a state that doesn’t match our source of truth.

This solution is based on the concept of streams. I’m using Kafka, but Redis Streams or AWS Kinesis will work all the same.

The idea is to store all the new changes (like a new follower, username change, and so on) in a partitioned topic. Make sure your partition key is consistent with the document id to guarantee order but also avoid one partition for each user id or you will kill your Kafka cluster.

Order is important for events that override the last field value like username change. We want to make sure we persist with the last version and not an intermediate one.

To process these messages, we’ll need a stream processing solution. For this example, I’ll be using Faust, but it’s such a simple pattern that I recommend you use the one that suits you better.

Using Kafka and Faust to aggregate user events before saving in Elasticsearch

Using streams can be a barrier if you haven’t adopted it yet, so I decided to include an example with Redis too (not Redis streams).

Redis has many data structures as we’ve seen in this article, and that means we have plenty of valid ways to get this job done.

One way is to make use of the atomic operations RPUSH and LPOP to get the job done.

Even though Redis is single-threaded, and thus ensuring the order of operations, that doesn’t guarantee that your workers will respect that. There’s a good chance of running into race conditions if you have multiple workers reading from the same queue.

For example, consider that the first and second batch contains a username change for the same user and they are handled by two different workers. You lose any order guarantees.

To overcome that, we need to follow a partition pattern similar to the one we used for our Kafka solution but, in this case, using multiple LIST (Redis Keys), and having one single worker assigned to each list.

Further Reading

How does this all sound? Is there anything you’d like me to expand on?

Let me know your thoughts in the comments section below (and hit the clap if this was useful)!

Stay tuned for the next post. Follow so you won’t miss it!

Principal Engineer @ Farfetch