Hexadecimal radix to sort a list of 4-byte integers

52 views Asked by At
#include <stdio.h>
#include <stdlib.h>

#define MAX_SIZE 100

void radix_sort(int arr[], int n) {
    int buckets[16][MAX_SIZE];
    int bucket_count[16];
    int i, j, k, r, shift, pass, NOP = 8;
    int buffer[MAX_SIZE];

    for (pass = 0; pass < NOP; pass++) {
        for (i = 0; i < 16; i++)
            bucket_count[i] = 0;

        shift = pass * 4;
        for (i = 0; i < n; i++) {
            r = ((unsigned int)arr[i] >> shift) & 15;
            buckets[r][bucket_count[r]] = arr[i];
            bucket_count[r]++;
        }

        i = 0;
        for (k = 0; k < 16; k++) {
            for (j = 0; j < bucket_count[k]; j++) {
                buffer[i] = buckets[k][j];
                i++;
            }
        }

        for (i = 0; i < n; i++)
            arr[i] = buffer[i];
    }
}

int main() {
    int arr[MAX_SIZE];
    int n, i;

    scanf("%d", &n);

    for (i = 0; i < n; i++)
        scanf("%d", &arr[i]);

    radix_sort(arr, n);

    for (i = 0; i < n; i++)
        printf("%d\n", arr[i]);

    return 0;
}

The objective is a hexadecimal radix sort. This means the program should process 4 bits at a time and use 16 buckets for sorting. Ok, got that. My buffer array and number array will never exceed 100. I get it to sort but not in the correct order. This is the output I always get:

 - *14394
 - *-5905
 - *-12765
 -  *1193
 -  *-7496
 -  *================ execution result ================
 -  *1193
 -  *14394
 -  *-12765
 -  *-7496
 -  *-5905
 -  *====== differences from correct results ======
 -  *1,2d0
 -  *< 1193
 -  *< 14394
 -  *5a4,5
 -  *> 1193
 -  *> 14394

Where the negative numbers get put at the bottom of the list. Larger negative numbers should be at the top and the positive descending numbers below i.e.:

  • -100
  • -10
  • 0
  • 10
  • 100

I'm not exactly sure how to achieve this result. I saw a solution on SO for handling negatives in radix that said treat the sign as a special kind of digit but the poster did not specify. I feel I'm on the right track I just need it to sort properly but my attempts to change the logic have been causing huge bugs and all zero results.

2

There are 2 answers

0
John Bollinger On BEST ANSWER

The basic issue is that you are not sorting the input numbers by their own values. You are sorting them by the values resulting from conversion to unsigned int. That conversion even appears explicitly in your code. The result is that the signed values end up ordered from zero to most positive, followed by most negative to least negative.

If you want to sort numbers of signed type by their own numeric values then radix sort is not a clean fit. At minimum, you need to make some kind of special accommodation for the signs.

I saw a solution on SO for handling negatives in radix that said treat the sign as a special kind of digit but the poster did not specify.

One way to think about the sign bit is that it has a negative place value of larger magnitude than the (positive) place values of all of the other bits. For example, in 8-bit two's complement, the bits' place values can be construed as

\ bit number 7 6 5 4 3 2 1 0
place value -128 64 32 16 8 4 2 1

That makes any set of contiguous bits containing the sign bit qualitatively different from all those that do not contain the sign bit.

Consider these bit patterns:

number \ 7 6 5 4 3 2 1 0
-128 1 0 0 0 0 0 0 0
-125 1 0 0 0 0 0 1 1
-64 1 1 0 0 0 0 0 0
-1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 1
7 0 0 0 0 0 1 1 1

Observe that a base-2 (bit-by-bit) radix sort would be straightforward -- normal except that the sense of the comparisons for the sign bits needs to be reversed on account of the negative place value of that bit.

You can use that in your sort by implementing the following changes:

  • in the main sort loop, sort by the 31 ordinary value bits only. That's an easy enough change:

            unsigned int magnitude_mask = ~0u >> 1;
            // ...
    
                r = ((magnitude_mask & (unsigned int) arr[i]) >> shift) & 0xf;
    
  • after the main sort loop, perform one extra pass in which you sort in reverse by the sign bit. There are multiple ways to think about and implement that, but one would be to use the same code as the main loop, except for calculating r like so:

                r = (~(unsigned int) arr[i]) >> 31;
    

    The bitwise negation effects the "in reverse" part in that case.

0
Fe2O3 On

"Zero" is a label that can be arbitrary. Why is 0°F positioned where it is? For convenience, both 0°C and 0 Kelvin are defined by physical phenomena. Temperatures expressed in one of the last two scales can be easily converted to the other merely by adding or subtracting a constant bias. Therein lies a solution.

Regarding your code:

  1. First get the implementation to work using a "compile time" array of values. Augment working code with the gathering (and validating) "user input" after the heart of the program is working. (eg: The code has a limit of 100 values, yet the user could easily enter many more than that, overwriting memory.)
  2. Limit the scope of variables rather than declaring all in a cluster at the start of a function.
  3. While 15 is the decimal representation of 0xF, convention suggests the hexadecimal notation makes the "4 bits" nature more obvious to the reader.
  4. You'll notice that buffer[] is, in fact, unnecessary.

The following is provided as a prototype for your careful consideration.

#include <stdio.h>
#include <stdlib.h>

#define MAX_SIZE 100
#define BIAS 1000 // a silly value. Consider a sufficient 'bias' for your data

void radix_sort( unsigned arr[], const size_t n ) { // NB: "arr[]" is unsigned
    size_t i; // re-usable

    // "upshift" all values equally
    for( i = 0; i < n; i++ )
        arr[i] += BIAS;

    // notice the "scope" of additional variables
    for( unsigned pass = 0, NOP = 8; pass < NOP; pass++) {
        unsigned shift = pass * 4;
        unsigned buckets[16][ MAX_SIZE ];
        unsigned bucket_count[16] = { 0 }; // NB: codeless initialisation

        for( i = 0; i < n; i++ ) {
            size_t r = (arr[i] >> shift) & 0xF;
            buckets[r][bucket_count[r]] = arr[i];
            bucket_count[r]++;
        }

        i = 0;
        for( size_t k = 0; k < 16; k++ )
            for( size_t j = 0; j < bucket_count[k]; j++ )
                arr[i++] = buckets[k][j]; // copy back without intermediary "buffer[]"
    }

    // restore actual values by subtracting the bias
    for( i = 0; i < n; i++ )
        arr[i] -= BIAS;
}

int main( void ) {
    int i, arr[] = { 3, 5, 1, -5, -1, -3, 42, -42 };
    const size_t sz = sizeof arr/sizeof arr[0];

    for( i = 0; i < sz; i++ )
        printf( "%d ", arr[i] );
    puts( "" );

    radix_sort( (unsigned *)arr, sz );

    for( i = 0; i < sz; i++ )
        printf( "%d ", arr[i] );
    puts( "" );

    return 0;
}

Result:

3 5 1 -5 -1 -3 42 -42
-42 -5 -3 -1 1 3 5 42

Once you have the concept, you can chose a bias value that will give you correct results.

Notice that the 16 buckets on the stack, each providing storage for 100 int values is not insignificant. You might consider how to deal with the problem if there were, instead, several million values to be sorted.


Edit: Future development:
It is readily apparent that the inner loops perform 2 traversals of the array. The outer loop will make 8 iterations. (And there are two bias loops as well.) That's not insignificant.

Consider the sample data here, and the arbitrary (silly) value for the bias. If one more traversal were done to determine that the minimum required bias is only 42 (so the smallest data value - -42 - will be upshifted to 0 for processing), then the range of these biased data values would be 0 to 42+42. The larges of those "fits" into 0x00 to 0xFF (ie: two "nibbles").

The consequence is that the particular values of this array are sorted after only two iterations of the outer loop! The remaining six iterations of the outer loop do not achieve anything.

As an exercise, extend this code to prevent needless iterations of the outer loop (with its two inner traversals of the data.)