My dithering algorithm is super slow

1.2k views Asked by At

So here's some context. I'm working on this game called ShiftOS which takes place in an OS that starts off as a bare-bones run of the mill 80s operating system with not many features.

I'm trying to add a mechanic in where the user has to start out with binary (2-color) color depth and can only display black and white on the screen. Then they have to upgrade the color depth from 1-bit to 2-bit to 4-bit all the way to 24-bit. It's a really neat mechanic, but in practice it's seeming to be extremely difficult.

Of course, older systems around this time did at least TRY to make images look nice but of course they were limited by the color palettes given by the engineers, so they had to dither the images to arrange pixels in a way that made it look like the image was using more colors when in all reality it could only be using 2.

So I looked up some good dithering algorithms and started to learn the Floyd-Steinberg algorithm and soon ported it over to C# and System.Drawing.

Here's the code that I use.

var bmp = new Bitmap(source.Width, source.Height);
var sourceBmp = (Bitmap)source;
int error = 0;
for (int y = 0; y < bmp.Height; y++)
{
    for (int x = 0; x < bmp.Width; x++)
    {
        Color c = sourceBmp.GetPixel(x, y);
        int gray = ((c.R + c.G + c.B) / 3);
        if (gray >= 127)
        {
            error = gray - 255;
            bmp.SetPixel(x, y, Color.White);
        }
        else
        {
            error = gray;
            bmp.SetPixel(x, y, Color.Black);
        }
        /*
         * Pixel error diffusion map: Floyd-Steinberg. Thanks to Wikipedia.
         * 
         *  pixel[x + 1][y    ] := pixel[x + 1][y    ] + quant_error * 7 / 16
         *  pixel[x - 1][y + 1] := pixel[x - 1][y + 1] + quant_error * 3 / 16
         *  pixel[x    ][y + 1] := pixel[x    ][y + 1] + quant_error * 5 / 16
         *  pixel[x + 1][y + 1] := pixel[x + 1][y + 1] + quant_error * 1 / 16
         */

        if(x - 1 >= 0 && y + 1 != bmp.Height)
        {
            var bottomRightColor = sourceBmp.GetPixel(x - 1, y + 1);
            int bottomRightGray = ((bottomRightColor.R + bottomRightColor.G + bottomRightColor.B) / 3) + ((error * 3) / 16);
            if (bottomRightGray < 0)
                bottomRightGray = 0;
            if (bottomRightGray > 255)
                bottomRightGray = 255;
            sourceBmp.SetPixel(x - 1, y + 1, Color.FromArgb(bottomRightGray, bottomRightGray, bottomRightGray));
        }
        if (x + 1 != sourceBmp.Width)
        {
            var rightColor = sourceBmp.GetPixel(x + 1, y);
            int rightGray = ((rightColor.R + rightColor.G + rightColor.B) / 3) + ((error * 7) / 16);
            if (rightGray < 0)
                rightGray = 0;
            if (rightGray > 255)
                rightGray = 255;
            sourceBmp.SetPixel(x + 1, y, Color.FromArgb(rightGray, rightGray, rightGray));
        }
        if (x + 1 != sourceBmp.Width && y + 1 != sourceBmp.Height)
        {
            var bottomRightColor = sourceBmp.GetPixel(x + 1, y + 1);
            int bottomRightGray = ((bottomRightColor.R + bottomRightColor.G + bottomRightColor.B) / 3) + ((error) / 16);
            if (bottomRightGray < 0)
                bottomRightGray = 0;
            if (bottomRightGray > 255)
                bottomRightGray = 255;
            sourceBmp.SetPixel(x + 1, y + 1, Color.FromArgb(bottomRightGray, bottomRightGray, bottomRightGray));
        }
        if (y + 1 != sourceBmp.Height)
        {
            var bottomColor = sourceBmp.GetPixel(x, y + 1);
            int bottomGray = ((bottomColor.R + bottomColor.G + bottomColor.B) / 3) + ((error * 5) / 16);
            if (bottomGray < 0)
                bottomGray = 0;
            if (bottomGray > 255)
                bottomGray = 255;
            sourceBmp.SetPixel(x, y + 1, Color.FromArgb(bottomGray, bottomGray, bottomGray));
        }
    }
}

Note that source is an Image that is passed through to the function via an argument.

This code works pretty well, however the problem is, the dithering is happening on a separate thread to minimize slowdowns/lag in the game, and while the dithering is occurring, the regular 24-bit colors/images of the operating system are shown. This would be fine if the dithering didn't take so long.

However I notice that the algorithm is EXTREMELY slow in this code and depending on the size of the image I'm dithering, the dithering process could take up to more than a minute!

I have applied all optimizations I can think of - such as running things in a separate thread from the game thread and invoking an Action that's given to the function when the thread finishes but this only shaves off a tiny bit of time if any.

So I'm wondering if there are any further optimizations to make this operate any faster, a few seconds total if possible. I'd like to also note that while the dithering operation is occurring I am having noticeable system lag - the mouse even jitters and jumps around at times. Not cool for those must-have-60FPS PC master race guys.

3

There are 3 answers

0
VMAtm On

First that comes in my head is to deal with Bitmap as it would be array. By default it's not an option, as there is no interface to do that, but you can, with some hacks, achieve that. Quick search followed me to this answer. So, you must set your method as unsafe, get the pixel values with LockBits, and access them with pointer math (refer the original answer for the full code):

System.Drawing.Imaging.BitmapData bmpData =
    bmp.LockBits(rect, System.Drawing.Imaging.ImageLockMode.ReadWrite,
    bmp.PixelFormat);
var pt = (byte*)bmpData.Scan0;
// for loop
var row = pt + (y * bmpData.Stride);
var pixel = row + x * bpp; // bpp is a number of dimensions for the bitmap

pixel will be an array with information about the colors coded in byte values. As you already saw, GetPixel and SetPixel are slow, because they are actually call the LockBits for assuring the operation. Array will help you to remove read operations, however, 'SetPixel` still maybe a bottleneck as you may need to update the bitmap as quick as you can. If you can update it in the end and all at once, then do that.

Second thought is to create some Task queue which will update your array step by step. As I can see, you update your image from one angle, so, maybe, you can setup a parallel version of your update. Maybe you can create an immutable array of current state with versioning, so in the end you just sum up the new version of bmp.

0
Svetlyo On
  1. You have to obtain a buffer to a pixels, not to use Get/Set functions
  2. You have to avoid computations inside the cycles. Reduce them by pre-calculate what is possible
  3. After you are done, put back the pixels to image
  4. You could use ordered dither, as it uses less computations and is far faster than Floyd-Steinberg. It produces good quality and was used back in time when monitors had less colors.

This code is in C, but you can rewrite it to C# easily.

#define f7_16   112
#define f5_16    80
#define f3_16    48
#define f1_16    16

//  Black-white Floyd-Steinberg dither
void    makeDitherFS( BYTE* pixels, int width, int height ) noexcept
{
    const int   size    = width * height;

    int*    error   = (int*)malloc( size * sizeof(int) );

    //  Clear the errors buffer.
    memset( error, 0, size * sizeof(int) );

    //~~~~~~~~

    int i   = 0;

    for( int y = 0; y < height; y++ )
    {
        BYTE*   prow   = pixels + ( y * width * 3 );

        for( int x = 0; x < width; x++,i++ )
        {
            const int   blue    = prow[x * 3 + 0];
            const int   green   = prow[x * 3 + 1];
            const int   red     = prow[x * 3 + 2];

            //  Get the pixel gray value.
            int newVal  = (red+green+blue)/3 + (error[i] >> 8); //  PixelGray + error correction

            int newc    = (newVal < 128 ? 0 : 255);
            prow[x * 3 + 0] = newc; //  blue
            prow[x * 3 + 1] = newc; //  green
            prow[x * 3 + 2] = newc; //  red

            //  Correction - the new error
            const int   cerror  = newVal - newc;

            int idx = i+1;
            if( x+1 < width )
                error[idx] += (cerror * f7_16);

            idx += width - 2;
            if( x-1 > 0 && y+1 < height )
                error[idx] += (cerror * f3_16);

            idx++;
            if( y+1 < height )
                error[idx] += (cerror * f5_16);

            idx++;
            if( x+1 < width && y+1 < height )
                error[idx] += (cerror * f1_16);
        }
    }

    free( error );
}

For more dithering algorithms you can see here

0
Surt On

The answer by @VMAtm is likely the most important.

            if (bottomRightGray < 0)
                bottomRightGray = 0;
            if (bottomRightGray > 255)
                bottomRightGray = 255;

Might be refactored to

bottomRightGray = Clamp(bottomRightGray, 0, 255);

Potentially improving performance if its implemented with some ASM magic.

((error * X) / 16)

Can be pre-calculated once in the program for each of the four X as error can only be 0..255, making a tabel of 256*4 values. This might or might not improve speed.