I want to generate a vector of a given length, e.g., n = 5. Each value in the vector should be a proportion (i.e., a value between 0 and 1) so that across n elements they sum up to 1.
Unfortunately, I have two vectors: one (mymins) defines the allowed lower boundaries of each proportion and the other (mymaxs) defines the allowed top boundaries of each proportion.
In my example below the desired proportion for the first element is allowed to fall anywhere between 0.3 and 0.9. And for the last element, the desired proportion is allowed to fall between 0.05 and 0.7.
mymins <- c(0.3, 0.1, 0, 0.2, 0.05)
mymaxs <- c(0.9, 1, 1, 1, 0.7)
Let's assume that mymins are always 'legitimate' (i.e., their sum is never larger than 1).
How could I find a set of 5 proportions such that they all sum to 1 but lie within the boundaries?
Here is what I tried:
n = 5
mydif <- mymaxs - mymins # possible range for each proportion
myorder <- rank(mydif) # order those differences from smallest to largest
mytarget <- sum(mydif) # sum up the 5 ranges
x <- sort(runif(n))[myorder] # generate 5 random values an sort them in the order of mydif
x2 <- mymins + x / sum(x) * mytarget # rescale random values to sum up to mytarget and add them to mymins
x3 <- x2/sum(x2) # rescale x2 to sum up to 1
As you can see, I am not very far - because after rescaling some values are outside of their allowed boundaries.
I should probably also mention that I need this operation to be fast - because I am using it in an optimization loop.
I also tried to find a solution using optim, however the problem is that it always finds the same solution - and I need to generate a DIFFERENT solutions every time I find the proporotion:
myfun <- function(x) {
x <- round(x, 4)
abovemins <- x - mymins
n_belowmins <- sum(abovemins < 0)
if (n_belowmins > 0) return(100000)
belowmax <- x - mymaxs
n_abovemax <- sum(belowmax > 0)
if (n_abovemax > 0) return(100000)
mydist <- abs(sum(x) - 1)
return(mydist)
}
myopt <- optim(par = mymins + 0.01, fn = myfun)
myopt$par
sum(round(myopt$par, 4))
Thank you very much for your suggestions!
Because you need 5 random numbers to sum to 1, you really only have 4 independent numbers and one dependent number.
The reason I use
iter
there is to make sure you don't take an "infinite" amount of time to find something. If yourmymins
andmymaxs
combination make this mathematically infeasible (as your first example was), then you don't need to spin forever. If it is mathematically improbable to find it in a reasonable amount of time, you need to weigh how long you want to do this.One reason this takes so long is that we are iteratively pulling entropy. If you expect this to go for a long time, then it is generally better to pre-calculate as much as you think you'll need (overall) and run things as a matrix.
This time it took 10000 attempts to make 19 valid combinations. It might take more or fewer attempts based on randomness, so ymmv with regards to how much you need to pre-generate.