Implementing the MessagePack Protocol in Java

Something that started as a simple dependency removal turned into a 2x performance improvement

Luis Sena
3 min readApr 11, 2022
Photo by Joshua Golde on Unsplash

Recently, I was experimenting with adding ranking capabilities to my Elasticsearch plugin. For that, I just needed an efficient way to transmit a list of ids and their respective score.

JSON would be an easy choice but as we’ve seen before, its performance is really bad.

Because of that, I decided to use MessagePack as the protocol to transmit that list of key pairs (id->score) and use JSON as the baseline performance for the plugin.

The only maintained Java library that I could find was msgpack-java, so I went with that.
As I was integrating it, I started to suspect it might be a bit too bloated for what I needed and when I saw that I required reflection to work… some alarm bells started to ring.

For those reasons, after everything was working correctly, I decided to implement the msgpack protocol myself since the protocol itself is very simple and I only needed a very small subset of it.

The final implementation can be distilled down to this code snippet:

Custom MessagePack java implementation to decode arrays of integers

Pretty simple and concise! It was time for some benchmarks to make sure I wasn’t trading too much complexity for performance.

Some benchmarks

The following numbers were obtained from the average of 10 000 runs using a random list of numbers for each run. Inside the same run, the same list was used for both implementations.

Benchmarks custom implementation vs msgpack-java

The msgpack-java package is on average 2x slower which turned out to be quite pleasant since the initial objective was only to remove a big dependency from my plugin!

One interesting factor was also the variability in the time it took the msgpack-java package to complete ad-hoc runs, sometimes taking as much as 100x more time to complete than my custom implementation.

Conclusions

I still think that using packages/libraries can potentially save you a lot of time and problems, and in most cases, be way better implemented and optimized than any solution you could come up with in a realistic timeframe.

But we need to be careful and avoid falling into the trap of using them carelessly, especially when you have something very specific and “small” that you’re handing over to a library since you might actually be buying extra complexity without any real benefit in return.

Further Reading:

How does this all sound? Is there anything you’d like me to expand on? Let me know your thoughts in the comments section below (and hit the clap if this was useful)!

Stay tuned for the next post. Follow so you won’t miss it!

--

--