How to generate a random stochastic matrix or ndarray?

589 views Asked by At

I was looking for a crate that would allow me to easily and randomly generate probability vectors, stochastic matrices or, in general, ndarrays that are stochastic. For people not familiar with these concepts, a probability vector v is defined as follows

  1. 0 <= v[i] <= 1, for all i
  2. sum(v[i]) = 1

Similarly, a stochastic matrix is a matrix where each column (or row) satisfies the conditions above.

More generally, a ndarray A would be stochastic if

  1. 0 <= A[i, j, k, ..., h] <= 1, for all indices
  2. sum(A[i, j, k, ..., :]) = 1, for all indices

Here, ... just means other indices between k and the last index h. : is a notation to indicate all elements of that dimension.

Is there a crate that does this easily (i.e. you just need to call a function or something like that)? If not, how would you do it? I suppose one could just generate a random ndarray and then change the array by dividing the last dimension by the sum of the elements in that dimension, so, for a 1d array (a probability vector), we could do something like this

use ndarray::Array1;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;

fn main() {
    let mut a = Array1::random(10, Uniform::new(0.0, 1.0));
    a = &a / a.sum();
    println!("The sum is {:?}", a.sum());
}

But how would you do it for higher dimensional arrays? We could use a for loop an iterate over all indices, but that doesn't look like it would be efficient. I suppose there must be a way to do this operation in a vectorized form. Is there a function (in the standard library, in the ndarray crate or some other crate) that does this for us? Could we use ndarray-rand to do this without having to divide by the sum?

Requirements

  • Efficiency is not strictly necessary, but it would be nice.
  • I am more looking for a simple solution (no more complicated than what I wrote above).
  • Numerical stability would also be great (e.g. generating random integers and then dividing by the sum may be a better idea than generating random floats and then do the same thing).
  • I would like to use ndarrays and the related crate, but it's ok if you share also other solutions (which may be useful to others that don't use ndarrays)
1

There are 1 answers

4
Severin Pappadeux On

I would argue that sampling with whatever distribution you have on hands (U(0,1), Exponential, abs Normal, ...) and then dividing by sum is the wrong way to go.

Start with distribution which has property values being in the [0...1] range and sum of values being equal to 1.

Fortunately, there is such distribution - Dirichlet distribution.

And, apparently, there is a Rust lib to do Dirichlet sampling. Cannot say anything about lib quality.

https://docs.rs/rand_distr/latest/rand_distr/struct.Dirichlet.html

UPDATE

Wrt sampling and then normalizing, problem is, noone knows what would be distribution of the RVs

U(0,1)/(U(0,1) + U(0,1) + ... + U(0,1))

Mean value? Median? Variance? Anything to say at all?

You could even construct it like

[U(0,1);Exp(2);|N(0,1)|;U(0,88);Exp(4.5);...] and as soon as you divide it by sum, values in the vector would be between 0 and 1 and summed to 1. Even less to say about properties of such RVs.

I assume you want to generate random vector/matrices for some purpose, like Monte Carlo etc. Dealing with known distribution with well-defined properties, mean values, variance looks like right way to go.

If I understand correctly, the Dirichlet distribution allows you to generate a probability vector, where the probabilities depend on the initial parameters that you pass, but you would still need to pass these parameters (manually)

Yes, concentration parameters. By default all ones, which makes RVs uniformly distributed in the simplex.

So, are you suggesting the Dirichlet distribution because it was designed on purpose to generate probability vectors?

I'm suggesting Dirichlet because by default it will produce uniformly in-the-simplex distributed RVs, summed to 1 and with well-known statistical properties, starting with PDF, CDF, mean, median, variance, ...

UPDATE II

For Dirichlet

PDF=Prod(xiai-1)/B(a)

So for the case where all ai=1

PDF = 1/B(a)

so given the constrains defining simplex Sum(xi)=1 this is as uniform as it gets.