Halide: Demosaic algorithm bug for larger images. Seems to work for 16x16 images.

468 views Asked by At

I'm trying to implement demosaic algorithm for Bayer filter as shown in section 2.8 (page 8) of this pdf: http://www.arl.army.mil/arlreports/2010/ARL-TR-5061.pdf. I'm stuck on trying to realize a function over an RDom. When I use a 16x16 image, the traces actually finish, but when I use a larger image like 768x1280, the trace gets stuck on:

Store green.0(767, 1279);

Following is the simplified version of my code:

#include "Halide.h"
#include<stdio.h>
#include<stdlib.h>
#include "halide_image_io.h"

using namespace Halide;


int main(int argc, char **argv) {

    Buffer<uint8_t> input = Tools::load_image(argv[1]);

    RDom r(2, input.width() - 4, 2, input.height() - 4);
    r.where((r.x % 2 == 0 && r.y % 2 == 0) || (r.x % 2 == 1 && r.y % 2 == 1));

    Var x("x"), y("y");

    Func g_n, w_n, g_n_est, green("green");

    g_n(x, y) = cast<float> (0);
    w_n(x, y) = cast<float> (0);
    g_n_est(x, y) = cast<float> (0);
    green(x, y) = cast<uint8_t> (0);


    printf("width: %d\n", input.width());
    printf("height: %d\n", input.height());
    printf("channels: %d\n", input.channels());

    g_n(r.x, r.y) = abs(cast<float>(input(r.x, r.y + 1) - input(r.x, r.y - 1))) + abs(cast<float>(input(r.x, r.y) - input(r.x, r.y - 2)));
    w_n(r.x, r.y) = cast<float>(1 / (1 + g_n(r.x, r.y)));
    g_n_est(r.x, r.y) = cast<float>(input(r.x, r.y - 1) + (input(r.x, r.y) - input(r.x, r.y - 2))) / 2;

    green(r.x, r.y) = cast<uint8_t>(w_n(r.x, r.y) * g_n_est(r.x, r.y));
    green.trace_stores();



    Buffer<uint8_t> temp = green.realize(input.width(), input.height());

    Tools::save_image(temp, "result.png");

}

Is this a bug in Halide? In this instance, the code finishes executing and saves the output image for a 16x16 input but gets stuck in the trace for larger images.

1

There are 1 answers

0
jrk On BEST ANSWER

This is just a really, really inefficient schedule. Every stage each compute O(n) pixels in their update definitions any time they are realized (that's now big the RDom r is), but the stages are also each inlined into the next. As a result, every point in green is recursively computing a whole image of g_n_est and w_n, and then for each of their pixels it's recursively computing a whole image of g_n.

What you see as a stall at green.0(767, 1023) is actually right after it has finished computing the pure definition of green(x,y) = 0 for the last pixel, at which point it starts taking forever to actually compute all of the update stages because of the O(n^3) work it's doing.

This is a case where getting aggressive turning on more tracing would make the problem clearer. You can turn tracing of realizations or individual stores on globally when you configure the compile: https://github.com/halide/Halide/wiki/Debugging-Tips#tracing.

For this code, scheduling the earlier stages as compute_root may be what you want, though you may actually want the g_n_est and w_n definitions to be simple pure functions (no RDoms at all) that can be fused into green, scheduled in blocks, etc.