Different methods to unpack CUDA half2 datatypes

Question

Different methods to unpack CUDA half2 datatypes

110 views Asked by Martin Ueding At 03 January 2025 at 11:31

I have some CUDA code which uses the half2 datatype. It should be just two 16 bit floating point numbers packed together in a 32 bit space.

Apparently there are the methods __low2half and __high2half which convert a half2 vector into a single half number. But there is also .x and .y on this vector. Most of the time these are the same, so this code works runs through most of the time:

assert(q.x == __low2half(q));
assert(q.y == __high2half(q));

However, there are some kernel runs where this doesn't hold. These are cases where they are not the same:

x=-57344,       y=-53376,   low=0.234497,   high=-0.17041
x=-inf,         y=nan,      low=0.00634766, high=0.473877
x=nan,          y=nan,      low=-0.0716553, high=0.540039
x=0,            y=0,        low=0,          high=0
x=3.8147e-05,   y=nan,      low=nan,        high=nan
x=-61440,       y=nan,      low=-0.999512,  high=0.31958

Which way of accessing them is correct, then?

Original Q&A

There are 1 answers

**Robert Crovella** · Accepted Answer · 2023-12-12T17:32:34+00:00

There should be no behavioral difference if properly applied.

Per IEEE definition, nan values are never equal to anything.

Apart from that, a simple exhaustive test case passes:

# cat t106.cu
#include <cuda_fp16.h>
#include <cstdio>
__global__ void k(){

  for (size_t qs = blockIdx.x*blockDim.x+threadIdx.x; qs < 0x100000000ULL; qs += gridDim.x*blockDim.x){
    unsigned qu = (unsigned)qs;
    half2 q = *(reinterpret_cast<half2 *>(&qu));
    if (!__isnan(q.x)) if (q.x != __low2half(q))  printf("fail x: %u\n", qu);
    if (!__isnan(q.y)) if (q.y != __high2half(q)) printf("fail y: %u\n", qu);
  }
}

int main(){

  k<<<1024,256>>>();
  cudaDeviceSynchronize();
  }

# nvcc -o t106 t106.cu
# ./t106
#

If you want to see e.g. nan values printed out, omit the first if-test in each case.

I wouldn't be able to respond to a claim about a specific test case unless a complete example of that test case is given. However I claim that your test case is not valid if it produces a result suggesting inequality, subject to above treatment.

TechQA.

Different methods to unpack CUDA half2 datatypes

There are 1 answers

Related Questions in CUDA

Related Questions in SIMD

Related Questions in HALF-PRECISION-FLOAT

Popular Questions

Popular Tags

Trending Questions