How to add each byte of an 8-byte long integer?

Question

How to add each byte of an 8-byte long integer?

1.3k views Asked by JB_User At 27 August 2013 at 18:42

I'm learning how to use the Intel MMX and SSE instructions in a video application. I have an 8-byte word and I would like to add all 8 bytes and produce a single integer as result. The straightforward method is a series of 7 shifts and adds, but that is slow. What is the fastest way of doing this? Is there an MMX or SSE instruction for this?

This is the slow way of doing it

unsigned long PackedWord = whatever....
int byte1 = 0xff & (PackedWord);
int byte2 = 0xff & (PackedWord >> 8);
int byte3 = 0xff & (PackedWord >> 16);
int byte4 = 0xff & (PackedWord >> 24);
int byte5 = 0xff & (PackedWord >> 32);
int byte6 = 0xff & (PackedWord >> 40);
int byte7 = 0xff & (PackedWord >> 48);
int byte8 = 0xff & (PackedWord >> 56);
int sum = byte1 + byte2 + byte3 + byte4 + byte5 + byte6 + byte7 + byte8;

Original Q&A

There are 3 answers

**nickie** · Answer 1 · 2013-08-27T19:40:32+00:00

nickie On 27 August 2013 at 19:40

Based on the suggestion of @harold, you'd want something like:

#include <emmintrin.h>

inline int bytesum(uint64_t pw)
{
  __m64 result = _mm_sad_pu8(*((__m64*) &pw), (__m64) 0LLU); // aka psadbw
  return _mm_cvtsi64_si32(result);
}

**fuz** · Answer 2 · 2013-08-27T19:24:00+00:00

I'm not an assembly guru but this code should be a little bit faster on platforms that don't have fancy SIMD instructions:

#include <stdint.h>

int bytesum(uint64_t pw) {
    uint64_t a, b, mask;

    mask = 0x00ff00ff00ff00ffLLU;
    a = (pw >> 8) & mask;
    b = pw & mask;
    pw = a + b;

    mask = 0x0000ffff0000ffffLLU;
    a = (pw >> 16) & mask;
    b = pw & mask;
    pw = a + b;

    return (pw >> 32) + (pw & 0xffffffffLLU);
}

The idea is that you first add every other byte, then every other word, and finally every other doubleworld.

**Veedrac** · Answer 3 · 2016-09-23T23:35:43+00:00

You can do this with a horizontal sum-by-multiply after one pairwise reduction:

uint16_t bytesum(uint64_t x) {
    uint64_t pair_bits = 0x0001000100010001LLU;
    uint64_t mask = pair_bits * 0xFF;

    uint64_t pair_sum = (x & mask) + ((x >> 8) & mask);
    return (pair_sum * pair_bits) >> (64 - 16);
}

This produces much leaner code than doing three pairwise reductions.

TechQA.

How to add each byte of an 8-byte long integer?

There are 3 answers

Related Questions in C

Related Questions in ASSEMBLY

Related Questions in SSE

Related Questions in MMX

Popular Questions

Popular Tags

Trending Questions