issue in code structure when trying to divide components to logical elements

17 views Asked by At

My team writes a url db. I am making some restructuring to optimize in it. Since Im also going to be the maintainer of it, I thought of adding readablity refactors. However, I have a situation where my OOP intuition says I should do a change, which doesn't make sense practically. I wondered how to resolve this dissonance.

Following is a basic description of the db, and the problem.

It uses a set of on-disk sstables of urls, url features, and metadata. The sstable are divided into categories. The db supports load requests, which tell it to upload a category to ram. It only keeps the url key and its features. (when running out of memory it removes the least recently used category). Lookup requests ask for urls whose features pass some linear threshold (a weighted sum on the feature score specified by the lookup). The db tries to find keys whose features are meeting the lookup request. Then it returns the full query from disk.

In the current implementation, the lookup asks for a list of DataShard structs, which contain the data, and the sstable path. It looks for relevant keys, and then looks for the full queries by opening the path.

The sstables themselves are not managed by my team. I understand that the load functionality must know how the sstables are structured to know how to load, but It seems to me that in this implantation the lookup class knows too much of the cache structure.

So it made sense to me to make DataShard a class, with its own logic. It would expose a lookup function which fetches full queries from a single sstable. This class would know of the division of features and metadata, and that the underlining system is implemented with sstables. But the lookup itself will only prioritize in-shard requests, and unify results from different shards.

However, when I look at this implementation in practice, it's worst. The fact that I know who shards lookups are implemented means I don't need to test for impossible edge conditions in shard lookups. For example, I know lookups go over a finite set of data, so they can't be stuck forever. If I made the division, lookup could only "know" what the signature of DataShard.lookup tells it. Which would mean more testing. Also, assuming the sstables do not change, I fail to see the practical advantages of this division.

So I am asking the readablity experts at stackoverflow, how should lookup responsibilities be divided, and why?

P.S the code is cpp, but I don't think that's relevant.

0

There are 0 answers