Matching Algorithm (Java)

1.8k views Asked by At

I am writing an algorithm to match students with different groups. Each group has a limited number of spots. Each student provides their top 5 choices of groups. The students are then placed into groups in a predetermined order (older students and students with perfect attendance are given higher priority). There is no requirement for groups to be filled entirely, but they cannot be filled passed capacity.

I've looked into similar marriage problems such as the Gale-Shapely stable marriage algorithm, but the problem I am having is that there far fewer groups than students and each group can accept multiple students.

What is the best way to implement such an algorithm to find a solution that has been optimized entirely such that there is no better arrangement of students in groups? In terms of algorithm complexity, I'm placing roughly 600 students into 10-20 groups.

4

There are 4 answers

1
Gene On BEST ANSWER

NB The close votes are terribly misplaced. Algorithm choice and design to solve an ambiguous problem is absolutely part of programming.

I think you'll get farther with Minimum Weight Bipartite Matching than Stable Marriage (also called the Hungarian method or algorithm or Maximum Weight Matching, which can give you a min weight matching just by negating the weights).

You are out to match positions with students. So the two node types in the bipartite graph are these.

The simplest statement of the algorithm requires a complete weighed bipartite graph with equal numbers of nodes in each set. You can think of this as a square matrix. The weights are the elements. The rows are students. The columns are positions.

The algorithm will pick a single element from each row/column such that the sum is minimized.

@nava's proposal is basically a greedy version of MWBM that's not optimal. The true Hungarian algorithm will give you an optimal answer.

To handle the fact that you have fewer positions than students is easy. To the "real" positions add as many "dummy" positions as needed. Connect all these to all the students with super high-weight edges. The algorithm will only pick them after all the real positions are matched.

The trick is to pick the edge weights. Let's call the ordinal where a student would be considered for a position O_i for the i'th student. Then let R_ip be the ranking that the same student places on the p'th position. Finally, W_ip is the weight of the edge connecting the i'th student to the p'th position. You'll want something like:

W_ip = A * R_ip + B * O_i

You get to pick A and B to specify the relative importance of the students' preferences and the order they're ranked. It sounds like order is quite important. So in that case you want B to be big enough to completely override students' rankings.

A = 1, B = N^2, where N is the number of students.

Once you get an implementation working, it's actually fun to tweak the parameters to see how many students get what preference, etc. You might want to tweak the parameters a bit to give up a little on the order.

At the time I was working on this (late 90's), the only open source MWBM I could find was an ancient FORTRAN lib. It was O(N^3). It handled 1,000 students (selecting core academic program courses) in a few seconds. I spent a lot of time coding a fancy O(N^2 log N) version that turned out to be about 3x slower for N=1000. It only started "winning" at about 5,000.

These days there are probably better options.

1
Nadir On

For each group:

  • create an ordered set and add all the students (you must design the heuristic which will order the students inside the set, which could be attendance level multiplied by 1 if the group is within their choice, 0 otherwise).

  • Fill the group with the first nth students

But there are some details that you didn't explain. For example, what happen if there are students that couldn't enter any of their 5 choices because they got full with other students with higher priority?

1
randombee On

I would modify the Knapsack problem (Knapsack problem, wikipedia) to work with K numbers of groups (knapsacks) instead of just one. You can assign "value" to the preferences they have and the number of spots would be the maximum "weight" of the Knapsack. With this, you can backtrack to check what is the optimal solution of the problem.

I am not sure how efficient you need the problem to be, but I think this will work.

0
nafas On

the most mathematically perfect

is very opinion based.

simplicity (almost) always wins. here

Here is a pseudocode:

students  <-- sorted by attendance

for i=0 to n in students:
   groups <-- sorted by ith student's preference
   for  j=0 to m in groups:
     if group j has space then add student i to group j; studentAssigned=true; break;

   if studentAssigned=false;
     add i to unallocated

for i=0 to k in unallocated 
  allocate i to a random group that is not filled