What would be a fast way to size-promote an array of unsigned integral elements?

102 views Asked by At

I need to implement the function:

promote(
    unsigned char*   __restrict__  dest, 
    unsigned char*   __restrict__  src 
    size_t                         src_element_size, 
    size_t                         dest_element_size, 
    size_t                         n
);

which takes an array of n (unsigned) integers, each of which represented by src_element_size consecutive bytes, and writes as output an array of n integers each represented by another number of bytes, dest_element_size. Let's assume dest_element_size > src_element_size, and that both arrays are aligned appropriately for their own type.

What would be a fast/the fastest way to perform this conversion?

Notes:

  • This is not a memcpy(), nor memset() + memcpy(). For example, If the source element size is 1 and the target element size is 2, and the source bytes are 123, 123, 123 then the destination bytes are 123, 0, 123, 0, 123, 0. You could think of it as a "strided memcpy()" I suppose.
  • The endianness is either the machine endianness or little-endianness, whichever you wish to assume.
  • src_element_size and dest_element_size are typically small and power-of-two (1,2,4,8) - but I'd rather get answers which are relevant to somewhat larger sizes (say < 200), and also for non-power-of-two sizes (3, 6 etc.)
  • The function itself may not be templated, but a suggestion for templating the implementation would be valid. The reason is that this will sit in a compiled library and code using it will not be recompiled with the template definition/header with __Generic.
  • No multi-threading / GPUs/ Xeon Phis/ or other exotic hardwaree.
  • src_element_size is less-than-or-equal-to dest_element_size.
1

There are 1 answers

5
Lundin On

If you can't use C _Generic or C++ templates, then you have to use a memset zero + memcpy.

memset(dest, 0, n);
for(size_t i=0; i<n; i++)
{
  memcpy(dest, src, src_size);
  dest += dest_size;
  src  += src_size;
}

The result will be "left-aligned", so this will only work for little endian.

An endianess-independent version will be a bit more intricate. You would have to replace memcpy with a loop, which bit shifts the individual bytes in place.

Or possibly something like this:

const uint16_t endian=1;
const bool is_little_endian = *(const uint8_t*)&endian == 1;
size_t offset = is_little_endian ? 0 : dest_size-src_size;

memset(dest, 0, n);
for(size_t i=0; i<n; i++)
{
  memcpy(dest + offset, src, src_size);
  dest += dest_size;
  src  += src_size;
}

As someone pointed out a comment, you should declare both character pointers as restrict to tell the compiler that it can assume they aren't pointing at the same location, for a bit of micro-optimization.


A solution with _Generic might look something like (not tested):

#define promote(dst, src, n)                                              \
  _Generic((dst),                                                         \
           unsigned char*:                                                \
             _Generic((src),                                              \
               unsigned char*:  promote_uchar_uchar,                      \
             ),                                                           \
                                                                          \
           unsigned int*:                                                 \
             _Generic((src),                                              \
                      unsigned char*: promote_uint_uchar,                 \
                      unsigned int*:  promote_uint_uint                   \
             ),                                                           \
           )(dst, src, n)                                                 

inline void promote_uchar_uchar (unsigned char* dest, unsigned char* src, size_t n)
{
  for(size_t i=0; i<n; i++)
  {
    dest[i] = src[i];
  }
}

inline void promote_uint_uchar (unsigned int* dest, unsigned char* src, size_t n)
{
  for(size_t i=0; i<n; i++)
  {
    dest[i] = src[i];
  }
}

Better yet, you can adjust this to for even better type safety:

inline void promote_uint_uchar (size_t n, unsigned int[static n], unsigned char[static n])