NoSQL Zone is brought to you in partnership with:

Ayende Rahien is working for Hibernating Rhinos LTD, a Israeli based company producing developer productivity tools for OLTP applications such as NHibernate Profiler (nhprof.com), Linq to SQL Profiler(l2sprof.com), Entity Framework Profiler (efprof.com) and more. Ayende is a DZone MVB and is not an employee of DZone and has posted 479 posts at DZone. You can read more from them at their website. View Full User Profile

World’s Smallest NoSQL Database: Persistent Data

09.30.2013
| 4002 views |
  • submit to reddit
The second item to go over with the World’s smallest NoSQL database is about persistence, and following that, I/O. Right now, there is no persistence. If you restart the process, all the data is gone. Now, there are actually quite a few real world databases that behave in this fashion. But they are a rarity. For the most part, if I put data inside a database, I expect it to be there until I do something about it.

And at that point, you are in for quite a bit of complexity. How are you going to persist the data? The easiest way to do it – just creating a file for every value in the database – is going to be problematic on most systems. So you need to put a lot of data in a small set of files. Which means that you have to decide how you are going to put the data together.

In general, there is the fixed size option in which you divide the file(s) into pages and work around that. The good thing about this is that it gives you the ability to reuse space in the file after deletes / updates. The bad thing is that it is quite complex. Alternatively, you can just write the data out as needed, but then you can’t really update written data and would need to run compactions.

And we haven’t talked about searching yet. Some DBs, like Bitcask / Munin, would actually store the keys in memory, and store the position on the disk for retrieving the value. But for the most part, both keys and values tend to be on disk in some form. In CouchDB, they are held inside an append only B+Tree. In LevelDB, they are held in Sorted String Tables. LMDB uses Copy-On-Write B+Tree. Esent uses a B+Tree with a Log file.

In each of those cases, the actual semantics for persistent data involve at least two concerns. You need to actually be able to search the data (that usually mean more than just O(1) access. You want to be able to go back and forth on the keys) and you need to be able to do a transactional save. This is so you can recover  in case of a crash, most of the time.

But there is actually a lot more that goes into the selection of the proper persistence format. To start with, how you store the data on disk will have a big effect on your performance. If you store the data as a linked list, for example, you might as well kiss your performance goodbye. Beyond that, we also have issues with things like how is the format going to scale when we have concurrent readers. For example, if you have something that does a lot of seeks, and relies on the seeks always going forward to ensure performance, that is going to be hit hard the moment that you have concurrent readers doing concurrent reads on different parts of the system. You would be forcing the system to do random seeks.

There are other considerations, as well. For example, if you are using something like B+Tree, it is likely that you’ll be overwriting the same section on the disk multiple times. That is fine with HDD, but SSD would just give up and die on you at some point. And we haven’t started talking about secondary indexes yet…

Published at DZone with permission of Ayende Rahien, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)