I'm trying to solve an optimization question similar to the toy example described below. As noted in the comment, my current implementation with scipy is too slow and doesn't seem to converge. How do I get a decent solution? You can use scipy, mystic, or whatever library as you see fit.
Note that I don't need a global minimum, and the search can stop as soon as loss(X) <= 1. The real-world objective is mostly written in SQL and thus absurdly slow, so I also want the optimization to terminate when loss has been evaluated ~200 times. (This is not a hard requirement, you can also terminate after the optimization has been running for say 5 minutes.)
While this question resembles Minimizing non-convex function with linear constraint and bound in mystic, it's definitely not a duplicate. These two questions aren't even treating the same objective.
import numpy as np
from scipy.optimize import differential_evolution, LinearConstraint
# Inhabitants in a rural city are voting for the name of a newborn. The city houses 40 dwarves
# and 10 humans in total, and there are 100 candidate names to vote for.
dwarf_population, human_population = 40, 10
population = dwarf_population + human_population
candidate_count = 100
# Each inhabitant has different number of votes in their hand.
scores_per_citizen = np.random.randint(15, 20, population)
# Let's say the original result looks like this. (Yes, one can cast a fraction of their votes)
alpha = np.abs(np.random.normal(size=candidate_count))
scores = np.diag(scores_per_citizen).dot(np.random.dirichlet(alpha, population))
assert np.allclose(scores.sum(1), scores_per_citizen)
# Here is how the votes are counted.
def count(scores_: np.ndarray) -> np.ndarray:
# Dwarves have a weird tradition: the top 10 popular names chosen by dwarves will get all their votes.
# (I guess this is what makes our objective non-convex.)
scores_by_dwarves = scores_[0:40, :]
score_per_candidate_by_dwarves_raw = scores_by_dwarves.sum(1)
top_10_popular_name_indices = np.argsort(-score_per_candidate_by_dwarves_raw)[:10]
score_per_candidate_by_dwarves = np.zeros(candidate_count)
score_per_candidate_by_dwarves[top_10_popular_name_indices] = score_per_candidate_by_dwarves_raw[
top_10_popular_name_indices]
score_per_candidate_by_dwarves = scores_by_dwarves.sum() \
* score_per_candidate_by_dwarves \
/ score_per_candidate_by_dwarves.sum()
assert np.allclose(score_per_candidate_by_dwarves.sum(), score_per_candidate_by_dwarves_raw.sum())
# Humans, on the other hand, just adds everyone's votes together.
scores_by_humans = scores_[40:, :]
scores_per_candidate_by_humans = scores_by_humans.sum(0)
# The final result is the sum of the scores by dwarves and humans.
return score_per_candidate_by_dwarves + scores_per_candidate_by_humans
# So this is the legitimate result.
scores_per_candidate = count(scores)
# Now, you want to cheat a bit and make your proposal (assuming it's the first one) the most popular one.
target_scores_per_candidate = scores_per_candidate.copy()
argmax = scores_per_candidate.argmax()
target_scores_per_candidate[argmax] = scores_per_candidate[0]
target_scores_per_candidate[0] = scores_per_candidate[argmax]
assert np.allclose(scores_per_candidate.sum(), target_scores_per_candidate.sum())
# However, you cannot just manipulate the result, otherwise the auditors will find out!
# Instead, the raw scores must be manipulated such that your submission ranks top among others.
# You decide to solve for a multiplier to the original scores.
init_coef = np.ones_like(scores).reshape(-1)
# In other words, your goal is to find the minimum of the following objective.
def loss(coef_: np.ndarray) -> float:
scores_ = scores * coef_.reshape(scores.shape)
scores_per_candidate_ = count(scores_)
return ((scores_per_candidate_ - scores_per_candidate) ** 2).sum()
# This is a helper constant matrix. Ignore it for now.
A = np.concatenate([np.tile(np.concatenate([np.repeat(1, candidate_count),
np.repeat(0, population * candidate_count)]),
population - 1),
np.repeat(1, candidate_count)])
A = A.reshape((population, population * candidate_count))
# Meanwhile, some constraints must hold.
def constraints(coef_: np.ndarray):
# The total votes of each citizen must not change.
coef_reshaped = coef_.reshape((population, candidate_count))
assert np.allclose((coef_reshaped * scores).sum(1), scores_per_citizen)
# Another way to express the same thing with matrices.
assert np.allclose(A.dot(np.diag(scores.reshape(-1))).dot(coef_), scores_per_citizen)
# Additionally, all scores must be positive.
assert np.all(coef_reshaped * scores >= 0)
# Let's express the constraint with a LinearConstraint.
score_match_quota = LinearConstraint(A.dot(np.diag(scores.reshape(-1))), scores_per_citizen, scores_per_citizen)
# Run optimization (Spoiler: this is extremely slow, and doesn't seem to converge)
rv = differential_evolution(loss,
bounds=[(0, 1000)] * init_coef.size, # the 1000 here is superficial and can be replaced by other large numbers
init=np.vstack((init_coef, init_coef, init_coef, init_coef, init_coef)),
constraints=score_match_quota)
# Sanity check
constraints(rv.x)
The answer is almost the same as the question you referenced... however, there are some constraints that I'd have to think a bit further to demonstrate that. Let me rewrite your code -- just using some shorter names.
The code is identical with your question up to this point. Now, let's use some of mystic's tools.
First, let's put it in the context of the question you referenced.
Now, as in the referenced question, we can use a simple solver like
fmin_powellwithconstraints=cons_to solve it. However, I've found that mystic can struggle a bit here as I'll demonstrate.I've killed the optimization here... when you see
infit means that mystic is struggling to solve the constraints.The problem is that there should be additional constraints to help it out, namely that
A_.dot(cons_(x0))should be integers. Additionally, it'd be nice to better control the presence of negatives.Matter of fact, every
citizens-th element is negative.So, I tried using penalties instead.
and that seems to work just fine (note I used 20000 evals instead of 200, and ftol to stop when loss is 500 instead of 1).
rvis pretty close, and the optimization was fairly fast.Here,
A_.dot(rv)isn't as accurate (round is to 2 places instead of 10)... and would again benefit from a constraint makingA_.dot(rv)integers.I'll leave that to a future example.