how does RocksDB cache writes?

145 views Asked by At

I have a few column families which have references to each other -- to construct a "full object" I need to join data across them. The upstream providing me with data often provides updates across cross-referenced items in multiple column families around the same time, but there's no guarantee about ordering. When I get an update to one of them, I need to do a lookup of the referenced values to construct the full object for my application to use. Ideally, I want these reads-of-recently-written-data to hit a cache so that I don't end up doing 1 or more read IOs per item that I write.

I know RocksDB keeps writes in RAM in a MemTable before flushing data into an SST file on disk, but I couldn't find an answer in the documentation about whether writes which have been flushed ever enter the LRU cache. Is allowing the MemTables to get really large the best / only way for me to tune the write caching behavior?

2

There are 2 answers

1
Dan Carfas On BEST ANSWER

From the Speedb Hive:

there is a parameter in table_options called: prepopulate_block_cache the default is disabled but you can set it to flush-only.

You can find the Speedb hive here and (once you've registered) the link to the thread with your question here, if you have more questions or need additional info

0
jaykorean On

In most scenarios, recently written data resides in the memtable, allowing for direct in-memory reads and bypassing the need to interact with the LRU cache. The write path typically does not involve the LRU cache. Are you worried that when a part of your "full object" is flushed in one of the column families, a new write operation might be necessary in another column family to retrieve the data from storage, as it is no longer present in the memtable after the flush?

While features like RowCache can be useful if you are already aware of certain frequently accessed objects that will be updated, if this issue is a concern in a broader context, it may not provide an effective solution for your use case.