Using BLAS ?gemm on a subset of an array in fortran

1.3k views Asked by At

The various LDx arguments to BLAS ?gemm functions are there to make it possible to operate on slices of larger arrays. For example, this small C program does a matrix multiplication of the top left and top right (100,100) submatrices of a (200,200) matrix, and stores the result in the bottom left (100,100) submatrix.

#include <stdlib.h>
#include <stdio.h>
#include <cblas.h>
int main()
{
    double * a = malloc(200*200*sizeof(double));
    for(int i = 0; i < 200*200; i++) a[i] = 1;
    cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 100, 100, 100, 1, a, 200, a+100*200, 200, 0, a+100, 200);
    printf("%f\n", a[100]);
    return 0;
}

$ gcc -o cblas_test{,.c} -lcblas
$ ./cblas_test
100.000000

The output is 100, as expected. Since BLAS is originally a fortran library, I expect this to be possible there too. But I can't see how to specify the array offsets there. Using slices does not work, resulting in a segmentation fault for the following program:

program fblas_example
    implicit none
    real(8), allocatable :: a(:,:)
    allocate(a(200,200))
    a = 1
    call dgemm('n','n',100,100,100,1d0,a,200,a(1:100,101:200),200,0d0,a(101:200,1:100),200)
    write(*,*) a(101,1)
end program

$ gfortran -o fblas_example{,.f90} -lblas
$ ./fblas_example
*** Error in `./fblas_example': double free or corruption (out): 0x00000000011351c0 ***

What am I doing wrong here?

Edit: This works if I massage a into a 1d array first:

program fblas_example2
    use, intrinsic :: iso_c_binding
    implicit none
    real(8), target, allocatable :: a(:,:)
    real(8), pointer :: fa(:)
    allocate(a(200,200))
    a = 1
    call c_f_pointer(c_loc(a), fa, [size(a)])
    call dgemm('n','n',100,100,100,1d0,fa,200,fa(100*200+1:),200,0d0,fa(101:),200)
    write(*,*) fa(101)
end program

$ gfortran -o fblas_example2{,.f90} -lblas
$ ./fblas_example2
   100.00000000000000

But it is pretty strange that one should have to go via ISO C bindings to be able to call a fortran library from fortran.

Just to be clear: I'm aware that one can just copy out the blocks, work on them, and copy them back. But what I'm asking about here is how to use BLAS own support for working on subsets of arrays.

2

There are 2 answers

2
ztik On BEST ANSWER

When you give the LDx you declare the actual size (leading dimension) of the matrix. Then BLAS uses this value to skip the not used data from the multiplication.

I think you should just use dgemm same way as in C. I mean you should not pass the submatrix a(1:100,101:200) but the position of the first value a(1,101) of the submatrix.

I tested the following and seems to work fine.

program fblas_example
  implicit none
  real(8), allocatable :: a(:,:)
  allocate(a(200,200))
  a = 1
  call dgemm('n','n',100,100,100,1d0,a,200,a(1,101),200,0d0,a(101,1),200)
  write(*,*) a(101,1)
end program

$ gfortran dgemm.f -lblas
$ ./a.out 
  100.00000000000000   
2
roygvib On

If array slices in Fortran >=90 are to be passed (rather than the address of some array element), I think the following line in the Fortran code

call dgemm('n','n', 100,100,100, 1d0, a,200, a(1:100,101:200),200, 0d0, a(101:200,1:100),200 )

should probably be modified as

call dgemm('n','n', 100,100,100, 1d0, a,200, a(1:100,101:200),100, 0d0, a(101:200,1:100),100 )

where the leading dimensions of the latter two array slices (or subarrays) have been changed from 200 to 100.

More specifically, a(1:100,101:200) and a(101:200,1:100) are passed to dgemm with implicit interface, where the array slices are discontiguous in memory. In this case, the compiler will first prepare two array temporaries of size 100x100 for each subarray, copy the matrix elements of the original array a to the array temporaries, and pass the address of the first element of the temporaries. Once the calculation by dgemm is finished, the contents of the temporaries are copied back to the original array a appropriately. If this is what happens, dgemm receives two arrays with leading dimension 100 (rather than 200).

One can check the generation of array temporaries by -fcheck-array-temporaries (in gfortran) and -check arg_temp_created (in ifort) options. For example, the latter gives

forrtl: warning (402): fort: (1): In call to DGEMM, an array temporary was created for argument #9
forrtl: warning (402): fort: (1): In call to DGEMM, an array temporary was created for argument #12
100.000000000000