Benchmark Elasticsearch sorting and ranking methods

Learn how to rank and sort your Elasticsearch documents efficiently.

Luis Sena
4 min readSep 20, 2022
Photo by Markus Spiske on Unsplash

Sort vs Ranking

Ranking in Elasticsearch can mean different things, but for this context, I’m talking about the ability to change the “score” of each matched document. The “score” is a decimal value that can vary depending on multiple factors, some of which I will discuss in this post.

Sort is the ability to sort the query results according to arbitrary fields. Those fields can be the score I’ve just mentioned, which is the default sorting, or a document attribute for example.

Sorting methods

Elasticsearch is a very flexible system, giving us different ways of having the same final results. It’s up to us to know and choose the best method for our use case.

In this article, I’m mainly comparing the features. Other tuning options are out of the scope of this article.

Sort

By default, it will sort using the _score in descending order. We can change that by using sort explicitly.

Pros

  • Simple to use, easy to read. Even if you’re not used to Elasticsearch, you’ll have a mental model of how sorting behaves.
  • Good performance (when done with doc_values). It can be even faster with features like index sorting.

Cons

  • The rescore functionality only works if we use the default sorting (_score desc).
  • It can only be used to untie relevance scores and not to contribute to the overall score.

Rank Feature

Pros

  • Good performance
  • Can be used with rescore

Cons

  • Needs an explicit field mapping to be used
  • Can use more resources if the explicit mapping for the rank feature is not compatible with your other uses.

Script Score

Pros

  • Very flexible
  • Can be used with rescore

Cons

  • Usually results in a performance hit
  • The “painless” script can be very painful to write for more complex use cases

Field Value Factor

Pros

  • Better performance than scripting
  • Can be used with rescore

Cons

  • Not as flexible and a bit more troublesome to use than a simple sort

Benchmark

Machine: 8 cores, 16GB RAM
Index: 600k docs, 1 Primary Shard
The latency is the average of 10,000 queries and is displayed in milliseconds.

The following benchmarks focus mainly on comparing the performance when matching the entire index of 600k docs.

For cases where you have a small index or you’re doing filtering/matching first and the final set of documents ES needs to sort is much smaller, you won’t see much difference in latency (at least, I didn’t).

I tested with just one field and two fields (the second field being used to disambiguate ties).

An option that is also important to note here is the track_total_hits.
Since this option depends a lot on the use case, it seemed sensible to show the results with and without it.

Benchmark with one sort fields and without track_total_hits
Benchmark with one sort field and with track_total_hits

This might be an anomaly (although I ran this multiple times purging caches between runs) but rank_feature when used for a single field and with track_total_hits was returning in 1 ms.

If it fits your use case, it might be worth a try to see if it improves your performance.

Aside from that surprise, the results are as expected. Sort is the fastest and function_score with scripting is the slowest.

Notable mention:

  • field_value_factor can be a very good compromise in performance if you need to use the rescore functionality of Elasticsearch while doing a sort by a specific field of your document.
Benchmark with two sort fields and without track_total_hits
Benchmark with two sort fields and with track_total_hits

When using two fields, rank_feature is still the fastest although with a much smaller margin than with the previous test when not using the option track_total_hits.

Aside from that, the results remained the same.

Conclusions

  • rank_feature: Depending on your use case, it might be the fastest way to rank/sort your documents. Worth a try!
  • field_value_factor: If you need to use the “rescore” functionality and use a specific field to change the order of the documents, this can be a very good candidate with a minimal performance penalty.
  • function_score: No surprises here. The most flexible way to change your sort/ranking comes at a cost that you need to acknowledge.

Into Elasticsearch? Check these out:

How does this all sound? Is there anything you’d like me to expand on? Let me know your thoughts in the comments section below (and hit the clap if this was useful)!

Stay tuned for the next post. Follow so you won’t miss it!

--

--