Efficiently ensuring that an activity occurs after another during PulP Optimization

58 views Asked by At

I have a scheduling optimisation problem where I need to optimise Activities to be performed in a given timeframe based on priority and dependencies.

The issue is that when creating dependencies, I have to create (and iterate) an insane amount of constraints with the current implementation, which results in issues when solving for high values of time_horizon.

Both dependent_activity and prerequisite activity are a list of binary variables Activity_A_0 which dictates whether an Activity_A is occurring at time t=0.

The current implementation makes sure that a 'prerequisite_activity' has occurred at least as many times as its 'duration' before the 'dependent_activity' can start by checking that the 'dependent_activity' at any point in time is smaller than or equal to the sum of it's prerequisite activities divided by it's duration (ie, it will only be 1 when the whole activity has completed).

import pulp

# Define activities with durations, resource requirements, priorities, and parallel constraints
# Resources are per timeframe (ie, if every hour you need 1 machine, the value is 1 not 1*hours)
# priority is total
activities = {
    "Activity A": {
        "duration": 2,
        "resources": {"resource1": 1, "resource2": 1},
        "priority": 3,
    },
    "Activity B": {
        "duration": 3,
        "resources": {"resource1": 2, "resource2": 1},
        "priority": 2,
    },
    "Activity C": {
        "duration": 1,
        "resources": {"resource1": 1, "resource2": 2},
        "priority": 1,
    },
    "Activity D": {
        "duration": 2,
        "resources": {"resource1": 1, "resource2": 1},
        "priority": 4,
    },
}

# Define activity dependencies
activity_dependencies = [
    ("Activity B", "Activity A"),
    ("Activity C", "Activity B"),
]

for name, value in activities.items():
    activities[name]["priority"] = (
        activities[name]["priority"] / activities[name]["duration"]
    )

# Define the time horizon
time_horizon = 10

# Create a LP problem
problem = pulp.LpProblem("ScheduleOptimization", pulp.LpMaximize)

# Create binary variables for each activity and time slot
activity_vars = {}
for activity in activities:
    for t in range(time_horizon):
        activity_vars[(activity, t)] = pulp.LpVariable(
            f"{activity}_{t}", 0, 1, pulp.LpInteger
        )

# Create a variable to represent the total priority
total_priority = pulp.LpVariable("TotalPriority", cat=pulp.LpContinuous)

# Objective: Maximize the total priority
problem += total_priority

# Constraints
## Activity Dependencies
for (dependent_activity, prerequisite_activity), t in itertools.product(
    activity_dependencies, range(time_horizon)
):
    problem += (
        activity_vars[(dependent_activity, t)]
        <= pulp.lpSum(activity_vars[(prerequisite_activity, tt)] for tt in range(0, t))
        / activities[prerequisite_activity]["duration"]
    ), f"Dependency ({dependent_activity},{prerequisite_activity}) t={t}"


# Update total_priority variable to reflect the actual total priority
problem += total_priority == pulp.lpSum(
    activity["priority"] * activity_vars[(activity_name, t)]
    for activity_name, activity in activities.items()
    for t in range(time_horizon)
)

# Solve the problem
problem.solve(pulp.PULP_CBC_CMD(msg=1))

# Print the schedule
schedule = {}
for activity_name in activities:
    for t in range(time_horizon):
        print(
            activity_vars[(activity_name, t)],
            "___",
            pulp.value(activity_vars[(activity_name, t)]),
        )
        if (
            pulp.value(activity_vars[(activity_name, t)]) == 1
            and schedule.get(activity_name) is None
        ):
            schedule[activity_name] = t

print("Optimal Schedule:")
for activity, start_time in schedule.items():
    print(f"{activity} starts at time {start_time}")
print(f"Total Priority: {pulp.value(total_priority)}")

Here I expect Activity A to complete before B which in turn needs to complete before C. So A>B>C

The number of these constraints is len(activity_dependencies) * time_horizon and the last of these constraints have as many as time_horizon variables which causes the optimiser to crash.

I need a solution which, ideally, reduces the number of constraints and the size of these constraints.

1

There are 1 answers

0
Ingwersen_erik On

You might consider defining the problem variables in terms of start and end times of each activity, instead of creating binary variables that represent each activity status at a given time.

Beware that this solution has a caveat that is detailed in the "Notes" section.

Here's the modified optimization problem code:

import pulp

# Define the time horizon
time_horizon = 10

# Define activities with durations, resource requirements, priorities,
# and parallel constraints
activities = {
    "Activity A": {"duration": 2, "resources": {"resource1": 1, "resource2": 1}, "priority": 3},
    "Activity B": {"duration": 3, "resources": {"resource1": 2, "resource2": 1}, "priority": 2},
    "Activity C": {"duration": 1, "resources": {"resource1": 1, "resource2": 2}, "priority": 1},
    "Activity D": {"duration": 2, "resources": {"resource1": 1, "resource2": 1}, "priority": 4},
}

# Define activity dependencies
activity_dependencies = [("Activity B", "Activity A"), ("Activity C", "Activity B")]

# Adjust priorities to account for duration
for activity in activities.values():
    activity["priority"] /= activity["duration"]

# == LP problem instance =======================================================
# Create the LP problem instance
problem = pulp.LpProblem("ScheduleOptimization", pulp.LpMaximize)

# == Problem Variables =========================================================
# Create start and end time variables for each activity
start_vars = {
    activity: pulp.LpVariable(f"start_{activity}", 0, time_horizon, pulp.LpInteger)
    for activity in activities
}
end_vars = {
    activity: pulp.LpVariable(f"end_{activity}", 1, time_horizon, pulp.LpInteger)
    for activity in activities
}

# == Objective Function ========================================================
# Create a variable to represent the total priority
total_priority = pulp.lpSum(
    activity["priority"] * (end_vars[activity_name] - start_vars[activity_name])
    for activity_name, activity in activities.items()
)

# Set the objective function
problem.setObjective(total_priority)

# == Constraints ===============================================================
# Constraint 1: Activity dependencies
for dependent_activity, prerequisite_activity in activity_dependencies:
    problem += (
        start_vars[dependent_activity]
        >= start_vars[prerequisite_activity]
        + activities[prerequisite_activity]["duration"],
        f"Dependency_{dependent_activity}_{prerequisite_activity}",
    )

# == Solve Problem =============================================================
problem.solve(pulp.PULP_CBC_CMD(msg=False))

# == Optimization Results ======================================================
# Print the schedule
if pulp.LpStatus[problem.status] == "Optimal":
    print("Optimal Schedule:")
    for activity in activities:
        start_time = start_vars[activity].value()
        end_time = end_vars[activity].value()
        print(f"{activity} starts at time {start_time:.0f} and ends at time {end_time:.0f}")
    print(f"Total Priority: {total_priority.value():.2f}")
else:
    print("No optimal solution found")

# Prints:
#
# Optimal Schedule:
# Activity A starts at time 0 and ends at time 10
# Activity B starts at time 2 and ends at time 10
# Activity C starts at time 5 and ends at time 10
# Activity D starts at time 0 and ends at time 10
# Total Priority: 45.33

Execution Time Comparison

The results shown in the table below represent the execution time for the new implementation versus the original problem formulation, using a time_horizon equal to 1000.

Implementation Execution Time (s)
Original Implementation 8.99
New Implementation 1.13

As you can see, the new implementation is much more efficient, since its optimization problem defines fewer variables and constraints, and the constraints being added are much smaller. Additionally, it scales more efficiently as larger time_horizon values are set.

Notes

Here's the catch of this modified problem formulation: it assumes that for every existing dependent activity, the prerequisite activity start time plus duration never exceeds the defined time_horizon value.

For example, say you have the following activities dictionary and activity_dependencies list:

time_horizon = 10
activities = {
    "Activity A": {"duration": 2, "resources": {"resource1": 1, "resource2": 1}, "priority": 3},
    "Activity B": {"duration": 3, "resources": {"resource1": 2, "resource2": 1}, "priority": 2},
    "Activity C": {"duration": 1, "resources": {"resource1": 1, "resource2": 2}, "priority": 1},
    "Activity D": {"duration": 2, "resources": {"resource1": 1, "resource2": 1}, "priority": 4},
    "Activity E": {"duration": 9, "resources": {"resource1": 1, "resource2": 1}, "priority": 8},
}

# Define activity dependencies
activity_dependencies = [("Activity B", "Activity A"), ("Activity C", "Activity B"), ("Activity E", "Activity A")]

For the above example, the modified optimization problem returns the following activities start and end times:

Optimal Schedule:
Activity A starts at time 0 and ends at time 10
Activity B starts at time 2 and ends at time 10
Activity C starts at time 5 and ends at time 10
Activity D starts at time 0 and ends at time 10
Activity E starts at time 2 and ends at time 10
Total Priority: 155.00

However, since "Activity E" duration plus its start time exceeds the time_horizon value, the "Activity E" start and end times should in fact be both equal to 0 instead of 2 and 10.