I have a bunch of users, with a given start and end time, e.g.:
{ Name = "Peter", StartTime = "10:30", EndTime = "11:00" },
{ Name = "Dana", StartTime = "11:00", EndTime = "12:30" },
{ Name = "Raymond", StartTime = "10:30", EndTime = "14:00" },
{ Name = "Egon", StartTime = "12:00", EndTime = "13:00" },
{ Name = "Winston", StartTime = "10:00", EndTime = "12:00" }
I want to put them into buckets, based on times that they overlap (based on a configurable threshold, e.g., they need to overlap at least half an hour). I want buckets to be ideally 4 items big, but any range from 2-5 is acceptable.
In the example above, no 4 people match, so I'd have a bucket of 3 (Peter, Raymond, Winston) and one of 2 (Dana, Egon).
I've prototyped an algorithm that seems to rely on chance rather than science:
- Order the List by StartTime
- Create an empty bucket
- Pick the first user from the List
- Check that user against all users in the bucket
- If that user overlaps with everyone in the bucket, put that person in it and remove it from the list
- If the bucket has the ideal size (4) or if I'm looping and checking the same user more than three times, close the bucket and create a new, empty one
This works well for the first few buckets, but leads to buckets with only 2 people that could be combined better.
I could change the algorithm to remove all ideal buckets from the list and reshuffle and try some more, but I feel that this should be a common problem - it's like shift assignments for workers, or maybe the knapsack problem.
Does anyone know a standard algorithm for this type of problem?
(Tagged combinatorics because I think this is the area of math it applies, correct me if wrong)
tl;dr: dynamic programming for the win (O(sort(n)) time).
First, note that bucketing contiguously in start-time order is fine.
Proposition (defragging): Let
a, b, c, d
be distinct users such thatStartTime(a) ≤ StartTime(b) ≤ StartTime(c) ≤ StartTime(d)
. IfX
andY
are valid buckets such thata, c ∈ X
andb, d ∈ Y
, thenX - {c} ∪ {b}
andY - {a} ∪ {d}
are valid buckets as well.I only know how to prove this by tedious case analysis (omitted).
The upshot is, you can pretend as though you're breaking a paragraph into lines, where the “paragraph“ is the list of users in start-time order, and each “line” is a bucket. There is an algorithm due to Knuth and Plass for line breaking optimally, where the penalty for a given line is a more or less arbitrary function. You could make buckets of 4 cost 0, buckets of 3 cost 1, buckets of 2 cost 2, and buckets of 1 cost 100, for example.