I want to define some function on a matrix X
. For example mean(pow(X - X0, 2))
, where X0
is another matrix (X0
is fixed / constant). To make it more specific, let's assume that both X
and X0
are 10 x 10
matrices. The result of the operation is a real number.
Now I have a big matrix (let's say 500 x 500
). I want to apply the operation defined above to all 10 x 10
sub-matrices of the "big" matrix. In other words, I want to slide the 10 x 10
window over the "big" matrix. For each location of the window, I should get a real number. So, as a final result, I need to get a real-valued matrix (or 2D tensor) (its shape should be 491 x 491
).
What I want to have is close to a convolutional layer but not exactly the same because I want to use a mean squared deviation instead of a linear function represented by a neuron.
This is only a Numpy solution, hope it suffices. I assume that your function is made up of an operation on the matrix elements and of a mean, i.e. a scaled sum. Hence, it is sufficient to look at
Y
as inSo we only need to deal with determining a windowed mean. Note that for the 1D case the matrix product with an appropriate vector of ones can be determined for calculating the mean, e.g.
The 2D case is analogous, but with two matrix multiplications, one for the rows and one for the columns. E.g.
To construct a sliding window, we need to build a matrix of shifted windows, e.g.
to calculate all the sums
A full example for a
(n, n)
matrix with a(m, m)
window would beAn alternative but probably slightly less efficient way for building
H
isUpdate
Calculating the mean squared deviation is also possible with that approach. For that, we generalize the vector identity
|x-x0|^2 = (x-x0).T (x-x0) = x.T x - 2 x0.T x + x0.T x0
(a space denotes a scalar or matrix multiplication and.T
a transposed vector) to the matrix case:We assume
W
is a(m,n)
matrix containing a block(m.m)
identity matrix, which is able to extract the(k0,k1)
-th(m,m)
sub-matrix byY = W Z W.T
, whereZ
is the(n,n)
matrix containing the data. Calculating the differenceis straightforward, where
X0
andD
is a(m,m)
matrix. The square-root of the squared sum of the elements is called Frobenius norm. Based on those identities, we can write the squared sum asThe term
Y0
can be interpreted asH Z H.T
from the method from above.The termY1
can be interpreted as a weighted mean onZ
andY2
is a constant, which only needs to be determined once. Thus, a possible implementation would be: