Search list for objects valid in a time range

163 views Asked by At

I have the following data structure which describes an object and the time period which it's valid. Pretend the numbers below are unix timestamps.

{
  "id": 1234,
  "valid_from": 2000
  "valid_to": 4000
},
{
 "id": 1235,
 "valid_from": 1000,
 "valid_to": 2200,
}
...

I want to quickly be able to store these items in JavaScript and then query for items which are valid at a certain time.

For example if I were to query for objects valid at 2100 I would get [1234, 1235]. If I were to query for objects valid at 3999 I would get [1234], and at 4999 nothing.

I will have a size of about 50-100k items in the structure and I'd like fast lookup speeds but inserts, and deletes could be slower.

Items will have duplicate valid_from and valid_to values so it needs to support duplicates. Items will overlap.

I will need to be continually inserting data into the structure (probably by bulk for initial load, and then one off updates as data changes). I will also be periodically modifying records so likely a remove and insert.

I am not sure what the best approach to this is in a highly efficient manner?

Algorithms are not my strong suit but if I just know the correct approach I can research the algorithms themselves.

My Idea:

I was originally thinking a modified binary search tree to support duplicate keys and closest lookup, but this would only allow me to query objects that are > valid_from or < valid_to.

This would involve me bisecting the array or tree to find all items > valid_from and then manually checking each one for valid_to.

I suppose I could have two search trees, one for valid_to and valid_from, then I could check which id's from the results overlap and return those id's?

This still seems kind of hacky to me? Is there a better approach someone can recommend or is this how it's done.

1

There are 1 answers

0
Tyler Durden On

Imagine you have two lists: initiation/begin and expiration/end. Both are sorted by TIME.

Given a particular time, you can find where in each list the first item is that meets the criteria by binary search. You can also do inserts by binary search into each list.

For example, if there are 1000 items and begin location is 342, then items 1-342 are possible, and if end location is 901, then items 901-1000 in the termination list are possible. You now need to intersect both sublists.

Take item IDs from 1-342 in begin and 901-1000 in end, and put them in a separate array (allocated ahead of time). Sort the array. Traverse the array. Whenever the same ID appears twice in a row, it is a hit, a valid match.