Why does LevelDB needs more than two levels?

4.7k views Asked by At

I think only two levels(level-0 and level-1) is ok, why does LevelDB need level-2, level-3, and more?

3

There are 3 answers

0
Jesse On

I'll point you in the direction of some articles on LevelDB and it's underlying storage structure.

So in the documentation for LevelDB it discusses merges among levels.

These merges have the effect of gradually migrating new updates from the young level to the largest level using only bulk reads and writes (i.e., minimizing expensive seeks).

LevelDB is similar in structure to Log Structured Merge Trees. The paper discusses the different levels if you're interested in the analysis of it. If you can get through the mathematics it seems to be your best bet to understanding the data structure.

A much easier to read analysis of levelDB talks about the datastore's relation to LSM Trees but in terms of your questions about the levels all it says is:

Finally, having hundreds of on-disk SSTables is also not a great idea, hence periodically we will run a process to merge the on-disk SSTables.

Probably the LevelDB documentation provides the best answer: (maximizing the size of the writes and reads, since LevelDB is on-disk(slow seek) data storage).

Good Luck!

0
jayadev On

I think it is mostly to do with easy and quick merging of levels.

In Leveldb, level-(i+1) has approx. 10 times the data compared to level-i. This is more analogous to a multi-level cache structure where in if the database has 1000 records between keys x1 to x2, then 10 of the most frequently accessed ones in that range would be in level-1 and 100 in the same range would be in level-2 and rest in level-3 (this is not exact but just to give an intuitive idea of levels). In this set up, to merge a file in level-i we need to look at at most 10 files in level-(i+1) and it can all be brought into memory, a quick merge done and written back. These results in reading relatively small chunks of data for each compaction/merging operation.

On the other hand if you had just 2 levels, the key range in one level-0 file could potentially match 1000's of files in level-1 and all of them need to be opened up for merging which is going to be pretty slow. Note that an important assumption here is we have fixed size files (say 2MB). With variable length files in level-1, your idea could still work and I think a variant of that is used in systems like HBase and Cassandra.

Now if you are concern is about look up delay with many levels, again this is like a multi-level cache structure, most recently written data would be in higher levels to help with typical locality of reference.

0
ren On

Level 0 is data in memory other levels are disk data. The important part is that data in levels is sorted. If level1 consists of 3 2Mb files then in file1 it's the keys 0..50 (sorted) in file2 150..200 and in file3 300..400 (as an example). So when memory level is full we need to insert it's data to disk in the most efficient manner, which is sequential writing (using as few disk seeks as possible). Imagine in memory we have keys 60-120, cool, we just write them sequentially as file which becomes file2 in level1. Very efficient! But now imagine that level1 is much larger then level0 (which is reasonable as level0 is memory). In this case there are many files in level1. And now our keys in memory (60-120) belong to many files as the key range in level1 is very fine grained. Now to merge level0 with level1 we need to read many files and make a lot of random seeks, make new files in memory and write them. So this is where many levels idea kicks in, we'll have many layers, each somewhat larger than the previous (x10), but not much larger so when we have to migrate data from i-1 to i-th layer we have a good chance of having to read least amount of files.

Now, since data might change there may be no need to propagate it to higher more expensive layers (it might be changed or deleted) and so we avoid expensive merges altogether. The data that does end up in the last level is statistically least likely to change so is the best fit for most-expensive-to-merge-with last layer.