Why DJ Bernstein CDB (constant database) uses 256 hashtables?

978 views Asked by At

Why DJB CDB (constant database) was designed to use 256 hashtables?

Why not single bigger 252 * 256 hashtable?

Is it only to save space or there are some other reason?

1

There are 1 answers

3
gstrauss On BEST ANSWER

DJB CDB uses two levels of hash tables. The first table is fixed size 2K at the beginning of the file. The second set of tables are at the end of the file, and is built in memory as data is being streamed into the cdb. Once all data has been streamed into the cdb, the second set of hash tables are streamed out to disk, and then the first table (at the beginning of the file) is populated with the offsets to each of the tables in the second set.

In other words, the multi-level hash tables allow streaming creation of the cdb with the simple exception of writing the beginning 2K of the file at the end of the cdb creation.

Access to the cdb is fast, hitting the first table (2K at beginning of the file) to find the offset of the second table (among the second set of tables) at the end of the cdb file, which provides the location of the data in the cdb.

Further information can be found in the NOTES at https://github.com/gstrauss/mcdb/ which is a rewrite of DJB's venerable cdb. mcdb is faster than cdb and removes the 4GB cdb limitation, among other benefits.