I want to figure out a tricky way to prevent overflow while using Neon Intrinsic in C for ARM. Here's the logic performed element by element:
min = array[0]
for(i=1;i<64;i++)
{
if(min > array[i])
{
min = array[i];
}
}
for(i=0;i<64;i++)
{
array[i] -= min;
}
I want an alternative solution, which eliminates the need of element by element operations, by performing operations in SIMD way. Thanks.
NOTE: In my case, I use four vectors of uint8x16_t
datatype. I want to find a single minimum from them and perform normalization (ie; my array with 64 elements, segmented into four uint8x16_t
vectors).
Use vmin_u8 multiple times to accumulate minimum values in a vector (say a 8x8)
Use vpmin_u8 'n' times on same vector - bubble sort (here, n = 8)
Use vdup_8(sorted_result[0]) to construct vector with target length
Use vsub_u8 to subtract => normalize.
– ffox