Maximizing a combination of a series of values

243 views Asked by At

This is a complicated one, but I suspect there's some principle I can apply to make it simple - I just don't know what it is.

I need to parcel out presentation slots to a class full of students for the semester. There are multiple possible dates, and multiple presentation types. I conducted a survey where students could rank their interest in the different topics. What I'd like to do is get the best (or at least a good) distribution of presentation slots to students.

So, what I have:

  • List of 12 dates
  • List of 18 students
  • CSV file where each student (row) has a rating 1-5 for each date

What I'd like to get:

  • Each student should have one of presentation type A (intro), one of presentation type B (figures) and 3 of presentation type C (aims)
  • Each date should have at least 1 of each type of presentation
  • Each date should have no more than 2 of type A or type B
  • Try to give students presentations that they rated highly (4 or 5)

I should note that I realize this looks like a homework problem, but it's real life :-). I was thinking that I might make a Student class for each student that contains the dates for each presentation type, but I wasn't sure what the best way to populate it would be. Actually, I'm not even sure where to start.

1

There are 1 answers

7
Junuxx On BEST ANSWER

TL;DR: I think you're giving your students too much choice :D

But I had a shot at this problem anyway. Pretty fun exercise actually, although some of the constraints were a little vague. Most of all, I had to guess what the actual students' preference distribution would look like. I went with uniformly distributed, independent variables, although that's probably not very realistic. Still I think it should work just as well on real data as it does on my randomly generated data.

I considered brute forcing it, but a rough analysis gave me an estimate of over 10^65 possible configurations. That's kind of a lot. And since we don't have a trillion trillion years to consider all of them, we'll need a heuristic approach.

Because of the size of the problem, I tried to avoid doing any backtracking. But this meant that you could get stuck; there might not be a solution where everyone only gets dates they gave 4's and 5's.

I ended up implementing a double-edged Iterative Deepening-like search, where both the best case we're still holding out hope for (i.e., assign students to a date they gave a 5) and the worst case scenario we're willing to accept (some student might have to live with a 3) are gradually lowered until a solution is found. If we get stuck, reset, lower expectations, and try again. Tasks A and B are assigned first, and C is done only after A and B are complete, because the constraints on C are far less stringent.

I also used a weighting factor to model the trade off between maximizing students happiness with satisfying the types-of-presentations-per-day limits.

Currently it seems to find a solution for pretty much every random generated set of preferences. I included an evaluation metric; the ratio between the sum of the preference values of all assigned student/date combos, and the sum of all student ideal/top 3 preference values. For example, if student X had two fives, one four and the rest threes on his list, and is assigned to one of his fives and two threes, he gets 5+3+3=11 but could ideally have gotten 5+5+4=14; he is 11/14 = 78.6% satisfied.

After some testing, it seems that my implementation tends to produce an average student satisfaction of around 95%, at lot better than I expected :) But again, that is with fake data. Real preferences are probably more clumped, and harder to satisfy.

Below is the core of the algorihtm. The full script is ~250 lines and a bit too long for here I think. Check it out at Github.

...

# Assign a date for a given task to each student, 
# preferring a date that they like and is still free.
def fill(task, lowest_acceptable, spread_weight=0.1, tasks_to_spread="ABC"):
        random_order = range(nStudents) # randomize student order, so everyone
        random.shuffle(random_order)    # has an equal chance to get their first pick
        for i in random_order:
            student = students[i]
            if student.dates[task]: # student is already assigned for this task?
                continue

            # get available dates ordered by preference and how fully booked they are
            preferred = get_favorite_day(student, lowest_acceptable,  
                                         spread_weight, tasks_to_spread)
            for date_nr in preferred:
                date = dates[date_nr]
                if date.is_available(task, student.count, lowest_acceptable == 1):
                    date.set_student(task, student.count)
                    student.dates[task] = date
                    break

# attempt to "fill()" the schedule while gradually lowering expectations
start_at = 5
while start_at > 1:
    lowest_acceptable = start_at
    while lowest_acceptable > 0:
        fill("A", lowest_acceptable, spread_weight, "AAB")
        fill("B", lowest_acceptable, spread_weight, "ABB")
        if lowest_acceptable == 1:
            fill("C", lowest_acceptable, spread_weight_C, "C")
        lowest_acceptable -= 1

And here is an example result as printed by the script:

                                      Date                                      
================================================================================
Student |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10 |  11 |  12 |
================================================================================
   1    |     |  A  |  B  |     |  C  |     |     |     |     |     |     |     |
   2    |     |     |     |     |  A  |     |     |     |     |  B  |  C  |     |
   3    |     |     |     |     |  B  |     |     |  C  |     |  A  |     |     |
   4    |     |     |     |  A  |     |  C  |     |     |     |     |     |  B  |
   5    |     |     |  C  |     |     |     |  A  |  B  |     |     |     |     |
   6    |     |  C  |     |     |     |     |     |     |  A  |  B  |     |     |
   7    |     |     |  C  |     |     |     |     |  B  |     |     |     |  A  |
   8    |     |     |  A  |     |  C  |     |  B  |     |     |     |     |     |
   9    |  C  |     |     |     |     |     |     |     |  A  |     |     |  B  |
   10   |  A  |  B  |     |     |     |     |     |     |  C  |     |     |     |
   11   |  B  |     |     |  A  |     |  C  |     |     |     |     |     |     |
   12   |     |     |     |     |     |  A  |  C  |     |     |     |  B  |     |
   13   |  A  |     |     |  B  |     |     |     |     |     |     |     |  C  |
   14   |     |     |     |     |  B  |     |     |     |  C  |     |  A  |     |
   15   |     |     |  A  |  C  |     |  B  |     |     |     |     |     |     |
   16   |     |     |     |     |     |  A  |     |     |     |  C  |  B  |     |
   17   |     |  A  |     |  C  |     |     |  B  |     |     |     |     |     |
   18   |     |     |     |     |     |     |  C  |  A  |  B  |     |     |     |
================================================================================
Total student satisfaction: 250/261 = 95.00%