Big Data/Analytics Zone is brought to you in partnership with:

Ayende Rahien is working for Hibernating Rhinos LTD, a Israeli based company producing developer productivity tools for OLTP applications such as NHibernate Profiler (nhprof.com), Linq to SQL Profiler(l2sprof.com), Entity Framework Profiler (efprof.com) and more. Ayende is a DZone MVB and is not an employee of DZone and has posted 459 posts at DZone. You can read more from them at their website. View Full User Profile

The Cost of Getting Data from LevelDB

03.12.2013
| 2814 views |
  • submit to reddit

We are currently investigating the usage of LevelDB as a storage engine in RavenDB. Some of the things that we feel very strongly about is transactions (LevelDB doesn’t have it) and performance (for a different definition of the one usually bandied about).

LevelDB does have atomicity, and the rest of CID can be built atop of that without too much complexity (already done, in fact). But we run into an issue when looking at the performance of reading. I am not sure if that is unique or not, but in our scenario, we typically deal with relatively large values. Documents of several MB are quite common. That means that we are pretty sensitive to memory allocations. It doesn’t help that we have very little control on the Large Object Heap, so it was with great interest that we looked at how LevelDB did things.

Reading the actual code make a lot of sense (more on that later, I will probably go through a big review of that). But there was one story that really didn’t make any sense to us, reading a value by key.

We started out using LevelDB Sharp:

Database.Get("users/1");

This in turn result in the following getting called:

image

A few things to note here. All from the point of view of someone who deals with very large values.

  • valuePtr is not released, even though it was allocated by us.
  • We copy the value from valuePtr into a string, resulting in two copies of the data and twice the memory usage.
  • There is no way to get just partial data.
  • There is no way to get binary data (for example, encrypted)
  • This is going to be putting a lot of pressure on the Large Object Heap.

But wait, it actually gets better. Let us look at the LevelDB method that get called:

image

So we are actually copying the data multiple times now. For fun, the db->rep->Get() call also copy the data. And that is pretty much where we stopped looking.

We are actually going to need to write a new C API and export that to be able to make use of that in our C# code. Fun, or not.

Published at DZone with permission of Ayende Rahien, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)