ESP8266 32-bit aligned memcpy

965 views Asked by At

The ESP8266 is running an xtensa core and to read data from flash storage all accesses must be performed with 32bit words. To perform this I wrote the following method:

void memcpy_P(void * dst, const void * src, const unsigned int len)
{
  char       * _dst = (      char *)dst;
  const char * _src = (const char *)src;

  unsigned int aligned_len = len & ~0x3;
  while(aligned_len > 0)
  {
    *(uint32_t *)_dst = *(uint32_t *)_src;
    _dst        += 4;
    _src        += 4;
    aligned_len -= 4;
  }

  const unsigned int remainder = len & 0x3;
  if (remainder > 0)
  {
    uint32_t tmp = *(uint32_t *)_src;
    _dst[0] = (tmp & 0xFF000000) >> 24;
    if (remainder > 1)
    {
      _dst[1] = (tmp & 0x00FF0000) >> 16;
      if (remainder > 2)
        _dst[2] = (tmp & 0x0000FF00) >>  8;
    }
  }
}

Is there any changes here one could suggest to improve performance?

Note: This is platform specific and will never be used on any other platform/architecture, an assembly version that specifically targets the xtensa core would be perfectly acceptable in this instance.

EDIT

Based on feedback/review & google I have come up with the following:

void memcpy_P(void * dst, const void * src, const unsigned int len)
{
  uint32_t       * _dst = (      uint32_t *)dst;
  const uint32_t * _src = (const uint32_t *)src;
  const uint32_t * _end = _src + (len >> 2);

  while(_src != _end)
    *_dst++ = *_src++;  

  const uint32_t rem = len & 0x3;
  if (!rem)
    return;

  const uint32_t mask = 0xFFFFFFFF << ((4 - rem) << 3);
  *_dst = (*_dst & ~mask) | (*_src & mask);
}
0

There are 0 answers