Benchmarking Large Riak Data Types: A Potential Fix

Russell Brown has a potential fix for the Data Type growth


A potential fix

As I mentioned in my last post, Russell Brown is working on a feature branch of riak_dt which may have a fix for some of this. Today, I ran my benchmark against a locally-compiled Riak, and then I built it with the potential fix.

My environment

I'm on Mac OS X 10.9.5, and I ran the benchmark against a single Riak 2.0.2 node running locally, installed from source. I have OTP R17, and therefore had to make a few changes in the Riak build system:

To include the patched version, I changed the riakdt branch that riakkv depends on to bug/rdb/faster-merge-orswot.

Results

Map writes has improved significantly. Not only does it no longer seem exponential, but it is much faster! Notice the time scales:

Unpatched Map writes (data-type API)

Patched Map writes (data-type API)

The same applies for Map reads:

Unpatched Map reads (data-type API)

Patched Map reads (data-type API)

The stripes in the Patched Map reads chart is likely due to the precision the benchmark uses when printing the time. They can be ignored.

However, there was a regression (tradeoff) with regards to size. It appears that the patched version grows at a faster rate, though it is still linear.

Unpatched Map byte size and number of items

Patched Map byte size and number of items

The blip around 1,000 items may be due to the change in the size of the elements from 3 to 4 bytes.

There are no visually significant changes for Map reads using the key-value API.

Unpatched Map reads (key-value API)

Patched Map reads (key-value API)

20k Elements

Testing the unpatched version with 20,000 elements would be unbearable, but did test the patched version to give a better idea of how it would behave at that size. It's not quite what I was expecting, but it is still much faster than the unpatched version with 3,000 elements.

Patched Map writes (20k elements) (data-type API)

Patched Map reads (20k elements) (data-type API)

Raw data

Here you go!

Future work

Once again, big thanks to Russell Brown for the quick turn-around on this! We're looking forward to using this change once it is officially released. We are also eagerly anticipating delta-mutation, which should bring the linear growth (for writes) down to amortized constant time, if I'm understanding the change correctly.