How to generate a Rank 5 matrix with entries Uniform?

503 views Asked by At

I want to generate a rank 5 100x600 matrix in numpy with all the entries sampled from np.random.uniform(0, 20), so that all the entries will be uniformly distributed between [0, 20). What will be the best way to do so in python?

I see there is an SVD-inspired way to do so here (https://math.stackexchange.com/questions/3567510/how-to-generate-a-rank-r-matrix-with-entries-uniform), but I am not sure how to code it up. I am looking for a working example of this SVD-inspired way to get uniformly distributed entries.

I have actually managed to code up a rank 5 100x100 matrix by vertically stacking five 20x100 rank 1 matrices, then shuffling the vertical indices. However, the resulting 100x100 matrix does not have uniformly distributed entries [0, 20).

Here is my code (my best attempt):

import numpy as np
def randomMatrix(m, n, p, q):
    # creates an m x n matrix with lower bound p and upper bound q, randomly.
    count = np.random.uniform(p, q, size=(m, n))
    return count

Qs = []
my_rank = 5
for i in range(my_rank):
  L = randomMatrix(20, 1, 0, np.sqrt(20))
  # L is tall
  R = randomMatrix(1, 100, 0, np.sqrt(20)) 
  # R is long
  Q = np.outer(L, R)
  Qs.append(Q)

Q = np.vstack(Qs)
#shuffle (preserves rank 5 [confirmed])
np.random.shuffle(Q)

2

There are 2 answers

2
yann ziselman On BEST ANSWER

I just couldn't take the fact the my previous solution (the "selection" method) did not really produce strictly uniformly distributed entries, but only close enough to fool a statistical test sometimes. The asymptotical case however, will almost surely not be distributed uniformly. But I did dream up another crazy idea that's just as bad, but in another manner - it's not really random.
In this solution, I do smth similar to OP's method of forming R matrices with rank 1 and then concatenating them but a little differently. I create each matrix by stacking a base vector on top of itself multiplied by 0.5 and then I stack those on the same base vector shifted by half the dynamic range of the uniform distribution. This process continues with multiplication by a third, two thirds and 1 and then shifting and so on until i have the number of required vectors in that part of the matrix.
I know it sounds incomprehensible. But, unfortunately, I couldn't find a way to explain it better. Hopefully, reading the code would shed some more light.
I hope this "staircase" method will be more reliable and useful.

import numpy as np 
from matplotlib import pyplot as plt

'''
params:
    N    - base dimention
    M    - matrix length
    R    - matrix rank
    high - max value of matrix
    low  - min value of the matrix
'''
N    = 100
M    = 600
R    = 5
high = 20
low  = 0

# base vectors of the matrix
base = low+np.random.rand(R-1, N)*(high-low)

def build_staircase(base, num_stairs, low, high):
    '''
    create a uniformly distributed matrix with rank 2 'num_stairs' different 
    vectors whose elements are all uniformly distributed like the values of 
    'base'.
    '''
    l = levels(num_stairs)
    vectors = []
    for l_i in l:
        for i in range(l_i):
            vector_dynamic = (base-low)/l_i
            vector_bias    = low+np.ones_like(base)*i*((high-low)/l_i)
            vectors.append(vector_dynamic+vector_bias)
    return np.array(vectors)


def levels(total):
    '''
    create a sequence of stritcly increasing numbers summing up to the total.
    '''
    l = []
    sum_l = 0
    i = 1
    while sum_l < total:
        l.append(i)
        i +=1
        sum_l = sum(l)
    i = 0
    while sum_l > total:
        l[i] -= 1
        if l[i] == 0:
            l.pop(i)
        else:
            i += 1
        if i == len(l):
            i = 0
        sum_l = sum(l)
    return l
        
n_rm = R-1 # number of matrix subsections
m_rm = M//n_rm
len_rms = [ M//n_rm for i in range(n_rm)]
len_rms[-1] += M%n_rm
rm_list = []
for len_rm in len_rms:
    # create a matrix with uniform entries with rank 2
    # out of the vector 'base[i]' and a ones vector.
    rm_list.append(build_staircase(
        base = base[i], 
        num_stairs = len_rms[i], 
        low = low,
        high = high,
    ))

rm = np.concatenate(rm_list)
plt.hist(rm.flatten(), bins = 100)

A few examples:
enter image description here enter image description here enter image description here

and now with N = 1000, M = 6000 to empirically demonstrate the nearly asymptotic behavior: enter image description here enter image description here enter image description here

12
yann ziselman On

Not a perfect solution, I must admit. But it's simple and comes pretty close.
I create 5 vectors that are gonna span the space of the matrix and create random linear combinations to fill the rest of the matrix. My initial thought was that a trivial solution will be to copy those vectors 20 times.
To improve that, I created linear combinations of them with weights drawn from a uniform distribution, but then the distribution of the entries in the matrix becomes normal because the weighted mean basically causes the central limit theorm to take effect.
A middle point between the trivial approach and the second approach that doesn't work is to use sets of weights that favor one of the vectors over the others. And you can generate these sorts of weight vectors by passing any vector through the softmax function with an appropriately high temperature parameter.
The distribution is almost uniform, but the vectors are still very close to the base vectors. You can play with the temperature parameter to find a sweet spot that suits your purpose.

from scipy.stats import ortho_group
from scipy.special import softmax
import numpy as np
from matplotlib import pyplot as plt
N    = 100
R    = 5
low  = 0
high = 20
sm_temperature = 100

p       = np.random.uniform(low, high, (1, R, N))
weights = np.random.uniform(0, 1, (N-R, R, 1))
weights = softmax(weights*sm_temperature, axis = 1)
p_lc    = (weights*p).sum(1)

rand_mat = np.concatenate([p[0], p_lc])

plt.hist(rand_mat.flatten())

enter image description here