Faster Method for Multiple Bilinear Interpolation?

2.6k views Asked by At

I am writing a program in C++ to reconstruct a 3D object from a set of projected 2D images, the most computation-intensive part of which involves magnifying and shifting each image via bilinear interpolation. I currently have a pair of functions for this task; "blnSetup" defines a handful of parameters outside the loop, then "bilinear" applies the interpolation point-by-point within the loop:

(NOTE: 'I' is a 1D array containing ordered rows of image data)

//Pre-definition structure (in header)
struct blnData{
    float* X;
    float* Y;
    int* I;
    float X0;
    float Y0;
    float delX;
    float delY;
};

//Pre-definition function (outside the FOR loop)
extern inline blnData blnSetup(float* X, float* Y, int* I)
{
    blnData bln;
    //Create pointers to X, Y, and I vectors
    bln.X = X;
    bln.Y = Y;
    bln.I = I;

    //Store offset and step values for X and Y
    bln.X0 = X[0];
    bln.delX = X[1] - X[0];
    bln.Y0 = Y[0];
    bln.delY = Y[1] - Y[0];

    return bln;
}

//Main interpolation function (inside the FOR loop)
extern inline float bilinear(float x, float y, blnData bln)
{
    float Ixy;

    //Return -1 if the target point is outside the image matrix
    if (x < bln.X[0] || x > bln.X[-1] || y < bln.Y[0] || y > bln.Y[-1])
        Ixy = 0;
    //Otherwise, apply bilinear interpolation
    else
    {
        //Define known image width
        int W = 200;

        //Find nearest indices for interpolation
        int i = floor((x - bln.X0) / bln.delX);
        int j = floor((y - bln.Y0) / bln.delY);

        //Interpolate I at (xi, yj)
        Ixy = 1 / ((bln.X[i + 1] - bln.X[i])*(bln.Y[j + 1] - bln.Y[j])) *
            (
            bln.I[W*j + i] * (bln.X[i + 1] - x) * (bln.Y[j + 1] - y) +
            bln.I[W*j + i + 1] * (x - bln.X[i]) * (bln.Y[j + 1] - y) +
            bln.I[W*(j + 1) + i] * (bln.X[i + 1] - x) * (y - bln.Y[j]) +
            bln.I[W*(j + 1) + i + 1] * (x - bln.X[i]) * (y - bln.Y[j])
            );
    }

    return Ixy;
}

EDIT: The function calls are below. 'flat.imgdata' is a std::vector containing the input image data and 'proj.imgdata' is a std::vector containing the transformed image.

int Xs = flat.dim[0];
int Ys = flat.dim[1];

int* Iarr = flat.imgdata.data();
float II, x, y;

bln = blnSetup(X, Y, Iarr);

for (int j = 0; j < flat.imgdata.size(); j++)
{
    x = 1.2*X[j % Xs];
    y = 1.2*Y[j / Xs];
    II = bilinear(x, y, bln);
    proj.imgdata[j] = (int)II;
}

Since I started optimizing, I have been able to reduce computation time by ~50x (!) by switching from std::vectors to C arrays within the interpolation function, and another 2x or so by cleaning up redundant computations/typecasting/etc, but assuming O(n) with n being the total number of processed pixels, the full reconstruction (~7e10 pixels) should still take 40min or so--about an order of magnitude longer than my goal of <5min.

According to Visual Studio's performance profiler, the interpolation function call ("II = bilinear(x, y, bln);") is unsurprisingly still the majority of my computation load. I haven't been able to find any linear algebraic methods for fast multiple interpolation, so my question is: is this basically as fast as my code will get, short of applying more or faster CPUs to the task? Or is there a different approach that might speed things up?

P.S. I've also only been coding in C++ for about a month now, so feel free to point out any beginner mistakes I might be making.

2

There are 2 answers

1
MSalters On

"vector 50x slower than array". That's a dead giveaway you're in debug mode, where vector::operator[] is not inlined. You will probably get the necessary speedup, and a lot more, simply by switching to release mode.

As a bonus, vector has a .back() method, so you have a proper replacement for that [-1]. Pointers to the begin of an array don't contain the array size, so you can't find the back of an array that way.

0
Zalman Stern On

I wrote up a long answer suggesting looking at OpenCV (opencv.org), or using Halide (http://halide-lang.org/), and getting into how image warping is optimized, but I think a shorter answer might serve better. If you are really just scaling and translating entire images, OpenCV has code to do that and we have an example for resizing in Halide as well (https://github.com/halide/Halide/blob/master/apps/resize/resize.cpp).

If you really have an algorithm that needs to index an image using floating-point coordinates which result from a computation that cannot be turned into a moderately simple function on integer coordinates, then you really want to be using filtered texture sampling on a GPU. Most techniques for optimizing on the CPU rely on exploiting some regular pattern of access in the algorithm and removing float to integer conversion from the addressing. (For resizing, one uses two integer variables, one which indexes the pixel coordinate of the image and the other which is the fractional part of the coordinate and it indexes a kernel of weights.) If this is not possible, the speedups are somewhat limited on CPUs. OpenCV does provide fairly general remapping support, but it likely isn't all that fast.

Two optimizations that may be applicable here are trying to move the boundary condition out the loop and using a two pass approach in which the horizontal and vertical dimensions are processed separately. The latter may or may not win and will require tiling the data to fit in cache if the images are very large. Tiling in general is pretty important for large images, but it isn't clear it is the first order performance problem here and depending on the values in the inputs, the cache behavior may not be regular enough anyway.