The various LDx
arguments to BLAS
?gemm
functions are there to make it possible to operate on slices of larger arrays. For example, this small C program does a matrix multiplication of the top left and top right (100,100) submatrices of a (200,200) matrix, and stores the result in the bottom left (100,100) submatrix.
#include <stdlib.h>
#include <stdio.h>
#include <cblas.h>
int main()
{
double * a = malloc(200*200*sizeof(double));
for(int i = 0; i < 200*200; i++) a[i] = 1;
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 100, 100, 100, 1, a, 200, a+100*200, 200, 0, a+100, 200);
printf("%f\n", a[100]);
return 0;
}
$ gcc -o cblas_test{,.c} -lcblas
$ ./cblas_test
100.000000
The output is 100, as expected. Since BLAS
is originally a fortran library, I expect this to be possible there too. But I can't see how to specify the array offsets there. Using slices does not work, resulting in a segmentation fault for the following program:
program fblas_example
implicit none
real(8), allocatable :: a(:,:)
allocate(a(200,200))
a = 1
call dgemm('n','n',100,100,100,1d0,a,200,a(1:100,101:200),200,0d0,a(101:200,1:100),200)
write(*,*) a(101,1)
end program
$ gfortran -o fblas_example{,.f90} -lblas
$ ./fblas_example
*** Error in `./fblas_example': double free or corruption (out): 0x00000000011351c0 ***
What am I doing wrong here?
Edit: This works if I massage a
into a 1d array first:
program fblas_example2
use, intrinsic :: iso_c_binding
implicit none
real(8), target, allocatable :: a(:,:)
real(8), pointer :: fa(:)
allocate(a(200,200))
a = 1
call c_f_pointer(c_loc(a), fa, [size(a)])
call dgemm('n','n',100,100,100,1d0,fa,200,fa(100*200+1:),200,0d0,fa(101:),200)
write(*,*) fa(101)
end program
$ gfortran -o fblas_example2{,.f90} -lblas
$ ./fblas_example2
100.00000000000000
But it is pretty strange that one should have to go via ISO C bindings to be able to call a fortran library from fortran.
Just to be clear: I'm aware that one can just copy out the blocks, work on them, and copy them back. But what I'm asking about here is how to use BLAS own support for working on subsets of arrays.
When you give the
LDx
you declare the actual size (leading dimension) of the matrix. ThenBLAS
uses this value to skip the not used data from the multiplication.I think you should just use
dgemm
same way as inC
. I mean you should not pass the submatrixa(1:100,101:200)
but the position of the first valuea(1,101)
of the submatrix.I tested the following and seems to work fine.