I'm trying to convert an algorithm, which is written in Fortran and uses column major ordering to C using row major ordering. The algorithm uses gemv blas calls.
I altered the calls for row major layout as in the cblas interface:
- toggle transpose flag
- swap of M and N
- alter leading dimensions
But the algorithm doesn't behave equal. I'm getting different results. I created a minimal sample which shows the behaviour.
#include <stdio.h>
void dgemv_( const char * t, const int * m, const int * n, const double * alpha, const double * A, const int *lda, const double * X, const int * incx,
const double * beta, double * Y, const int *incy );
int main()
{
const int M = 2, N = 2;
const int one = 1;
const double alpha = -1.0, beta = 1.0;
const char trans = 'T';
const char noTrans = 'N';
double Yc[4] = { 0x1.42c7bd3b6266cp+4, 0x1.6c6ff393729dp+4, 0x1.acee1f3938c0bp-2, 0x1.b0cd5ba440d93p+0 };
double Yr[4] = { 0x1.42c7bd3b6266cp+4, 0x1.acee1f3938c0bp-2, 0x1.6c6ff393729dp+4, 0x1.b0cd5ba440d93p+0 };
double A[2] = { 0x1.11acee560242ap-2, 0x1p+0 };
double Bc[2] = { 0x1.8p+2, 0x1.cp+2 };
double Br[2] = { 0x1.8p+2, 0x1.cp+2 };
dgemv_( &noTrans, &M, &N, &alpha, Yc, &M, A, &one, &beta, Bc, &one );
printf("Result Column Major\n");
printf("%a %a\n", Bc[0], Bc[1]);
dgemv_( &trans, &N, &M, &alpha, Yr, &N, A, &one, &beta, Br, &one );
printf("Result Row Major\n");
printf("%a %a\n", Br[0], Br[1]);
}
I used the format string %a to get the hexadecimal representation of the values to compare them. The resulting vector using column major version looks like:
0x1.8402515a17beap-3 -0x1.8e67415bce3aep-1
While to one in row major looks like these:
0x1.8402515a17bep-3 -0x1.8e67415bce3bp-1
How is this explainable and what can be done, to make the algorithms works equal?
If the results are compared with decimal representation
they agree up to 15 significant figures
so the difference seems sufficiently small for double-precision calculation with
double
. As for-0x1.8e67415bce3aep-1
and-0x1.8e67415bce3bp-1
, the difference is also below 1.0e-15.To obtain even better agreement, quadruple (or higher) precision is probably necessary.