Performance Zone is brought to you in partnership with:

Ayende Rahien is working for Hibernating Rhinos LTD, a Israeli based company producing developer productivity tools for OLTP applications such as NHibernate Profiler (nhprof.com), Linq to SQL Profiler(l2sprof.com), Entity Framework Profiler (efprof.com) and more. Ayende is a DZone MVB and is not an employee of DZone and has posted 457 posts at DZone. You can read more from them at their website. View Full User Profile

Some thoughts about compression & storage

06.24.2013
| 1021 views |
  • submit to reddit

One of the advantages that keeps showing up with levelDB is the notion that it compresses the data on disk by default. Since reading data from disk is way more expensive than the CPU cost of compression & decompression, that is a net benefit.

Or is it? In the managed implementation we are currently working on, we chose to avoid this for now. For a very simple reason. By storing the compressed data on disk, it means that you cannot just give a user the memory mapped buffer and be done with it, you actually have to decompress the data yourself, then hand the user the buffer to the decompressed memory. In other words, instead of having a single read only buffer that the OS will manage / page / optimize for you, you are going to have to allocate memory over & over again, and you’ll pay the cost of decompressing again and again.

I think that it would be better to have the client make that decision. They can send us data that is already compressed, so we won’t need to do anything else, and we would still be able to just hand them a buffer of data. Sure, it sounds like we are just moving the cost around, isn’t it? But what actually happens is that you have a better chance to do optimizations. For example, if I am storing the data compressing via gzip. And I’m exposing the data over the wire, I can just stream the results from the storage directly to the HTTP stream, without having to do anything about it. It can be decompressed on the client.

On the other hand, if I have storage level decompression, I am paying for the cost of reading the compressed data from disk, then allocating new buffer, decompressing the data, then going right ahead and compressing it again for sending over the wire.

Published at DZone with permission of Ayende Rahien, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)