Updates on the vino runtime
Image by Jordan Whitfield

Updates on the vino runtime

Authors
The Vino Team

November 3, 2021

Vino Runtime Progress

95%

A slight detour

If you’ve been following Vino, you know that we do our research and then double down and commit. Sometimes those positions don’t play out the way we hoped. It’s critical to recognize that quickly and cut losses before it’s too late.

Several months ago, one of Vino’s major dependencies took a direction that irreversibly broke compatibility. These are the risks of being an early adopter and it left us needing to reflect and re-commit. The decision was either: drop Rust and re-consider Erlang, or re-implement the foundation ourselves. While we love Erlang, we feel that Rust has a more promising future as a no-compromise, low-level language so we started down the path of reworking the runtime.

During this detour we were revisited other dependencies and removed the ones that weren’t scaling with us. The whole process left us with more of what we wanted and we’re nearly back to where we want to be.

Vino stats

The new runtime is fast! It actually surprised us because we have a long list of TODOs where we know we can improve performance. That means vino’s only going to get faster. Comparing and benchmarking pure algorithms isn’t valuable because they stress test the WebAssembly runtime (wasmtime) which we don’t control. Business isn’t built off algorithms alone, anyway. To get a good sense of what vino is, we benchmarked a common web flow against node.js.

The tests made a series of non-cacheable requests that stored the result of a commodity algorithm (bcrypt) in a database (four calls to redis, three parallelizable).

Bcrypt cost of 12, simulating moderate CPU work. (lower is better)

186.89 ms/run
37.67 ms/run

Now that looks great for vino but it’s not fair. The node.js implementation is a naive first-pass with asynchronous work await-ed in series. My first attempt at improving performance was to defer the waits until every request had been made, then await the results all together.

Iteration 1

186.564 ms/run
37.67 ms/run

This had no effect so I backed out the changes and started looking at the CPU bottleneck. The bcrypt package had an asynchronous API that I initially ignored but decided to revisit. CPU-bound work in node doesn’t usually benefit from async APIs because node.js is single-threaded. I gave it a shot anyway.

Iteration 2

191.942 ms/run
37.67 ms/run

It’s what I expected. While implementing the changes I noticed the bcrypt docs mention that the async API delegates to a worker pool. Offloading the work to another thread should mean we’d get improvements if we combine the above two approaches.

Iteration 3

48.238 ms/run
37.67 ms/run

Much better! I hadn’t wanted to deal with worker threads in node.js, but if there was an opportunity to see gains like this then I couldn’t pass up trying.

Iteration 4

195.454 ms/run
37.67 ms/run

Just kidding.

I accidentally routed around one of the initial improvements so the first results were the worst of the bunch. Extra work spawning threads for no gain. The real results were:

Iteration 5

49.025 ms/run
37.67 ms/run

It was just a different angle on the same CPU bottleneck. The bcrypt package maxed out the gains we could get without more serious changes in architecture. I’m positive I could have kept working on this to get a node.js implementation that wins, but that’s not the point.

The point is: I started with a working implementation and only added complexity. No new business logic. Nothing novel at all. Just bespoke implementations of generic best practices. I even included the results from Iteration 4 to highlight that I made an error which required its own troubleshooting and added no additional value.

Compare that to the vino component implementation which started basic and never changed. Not only that, the same exact binary works on the command line, as a microservice, an http service, plus more while the node.js version became more restricted over each iteration.

To be fair, node.js wins when the work is low-cpu coordination of blocking requests. With bcrypt removed and a program that simply makes the database calls, the results are clear.

Comparison with no cpu work

1.42 ms/run
.23 ms/run
2.94 ms/run

The node workers implementation simply screams! And it should! Highly tailored solutions optimized for specific chunks of work should beat anything generic. The problem is: it isn’t free. Optimizations like this require experienced engineers, are prone to error, increase maintenance cost, and at the end of the day still don’t involve anything novel. With Vino, it is free. We’ll be releasing the runtime in the coming days and will be open sourcing it to everyone around the same time. Stay tuned!