Fast access to matrix as jagged array in C#

812 views Asked by At

I've created a lower triangular distance matrix (because of size issues) as jagged array Note: Distances between objects are symmetric

var dm = new double[size][]
for (var i = 0; i < size; i++)
{
   dm[i] = new double[i+1];
   for (var j = 0; j < i+1; j++)
   {
      dm[i][j] = distance(data[i], data[j]);
   }
 }

I need to access this matrix very often so I made the following method for it

private double GetValueOfDM(int row, int column, double[][] dm)
{
    return column <= row ? distanceMatrix[row][column] : distanceMatrix[column][row];
}

With the Visual Studio performance analysis one sees the major speed issue lies in the only row of the GetValueOfDM method.

Has someone a good idea how to speed this up?

3

There are 3 answers

4
DLeh On BEST ANSWER

You could remove the conditional in the method and increase memory usage to increase access performance like so:

var dm = new double[size][];
for (var i = 0; i < size; i++)
{
   dm[i] = new double[size];
   for (var j = 0; j < i+1; j++)
   {
      dm[i][j] = distance(data[i], data[j]);
      dm[j][i] = dm[i][j];
   }
 }

private double GetValueOfDM(int row, int column, double[][] dm)
{
    return dm[row][column];
}

Now that you don't have a conditional, the compiler can remove a branch prediction. Also, you should run tests with your actual use cases to ensure that it is actually going to be a problem. Analysis would probably reveal that a branching conditional will be the slowest part of your code, but it doesn't necessarily mean that it's actually going to slow anything down noticeably. In addition, you could try running it in Release mode (with compiler optimizations) to see how it affects performance.

If you are on a system where you don't have the memory available to double the size of the array, then the code you have is probably close to optimal for accessing a jagged array.

7
Dai On

I'm guessing you're using this in a tight-loop? Arrays in .NET aren't /that/ fast because of automatic bounds-checking. If you need fast array perf use a pointer with a buffer:

sealed unsafe class DistanceData : IDisposable {
    private Double* buffer;
    private IntPtr  bufferLength; // .NET uses IntPtr as a size_t equivalent.
    private Int32   dim0Length;

    public DistanceData(Int32 size, Double[] data) {
        this.buffer       = (Double*)Marshal.AllocHGlobal( size * size );
        this.bufferLength = size * size;
        this.dim0Length   = size;

        for(int y = 0; y < size; y++) {
            for(int x = 0; x < y + 1; x++) {
                this.buffer[ y * this.dim0Length + x ] = Distance( data[y], data[x] );
            }
        }
    }

    public void Dispose() {
        Marshal.FreeHGlobal( this.buffer );
    }

    public Double GetValueOfDM(Int32 row, Int32 column) {
        // WARNING: Without validation or your own bounds-checking, invalid values of `row` and `column` will cause access-violation errors and crash your program. Ensure that code that calls `GetValueOfDM` is correct and will never submit invalid values.
        return this.buffer[ row * this.dim0Length  + column];
    }
}
1
Olivier Jacot-Descombes On

You could use a one-dimensional array and calculate the index like this

i = (r * r + r) / 2 + c;

But you still have to check for r <= c and do the flipping. (r=row, c=column)

But will this really be faster?