What performance can I expect from std::fill_n(ptr, n, 0) relative to memset?

2.3k views Asked by At

For an iterator ptr which is a pointer, std::fill_n(ptr, n, 0) should do the same thing as memset(ptr, 0, n * sizeof(*ptr)) (but see @KeithThompson's comment on this answer).

For a C++ compiler in C++11/C++14/C++17 mode, under which conditions can I expect these to be compiled to the same code? And when/if they don't compile to the same code, is there a significant performance difference with -O0? -O3?

Note: Of course some/most of the answer might be compiler-specific. I'm only interested in one or two specific compilers, but please write about the compiler(s) for which you know the answer.

2

There are 2 answers

0
lcs On

The answer depends on your implementation of the standard library.

MSVC for example has several implementations of std::fill_n based on the types of what you're trying to fill.

Calling std::fill_n with a char* or signed char* or unsigned char* and it will directly call memset to fill the array.

inline char *_Fill_n(char *_Dest, size_t _Count, char _Val)
{   // copy char _Val _Count times through [_Dest, ...)
_CSTD memset(_Dest, _Val, _Count);
return (_Dest + _Count);
}

If you call with another type, it will fill in a loop:

template<class _OutIt,
class _Diff,
class _Ty> inline
_OutIt _Fill_n(_OutIt _Dest, _Diff _Count, const _Ty& _Val)
{   // copy _Val _Count times through [_Dest, ...)
for (; 0 < _Count; --_Count, (void)++_Dest)
    *_Dest = _Val;
return (_Dest);
}

The best way to determine the overhead on your particular compiler and standard library implementation would be to profile the code with both calls.

1
Richard Hodges On

For all all scenarios where memset is appropriate (i.e. all your objects are PODs) you will most likely find that the two statements are equivalent when any level of optimisation is enabled.

For scenarios where memset is not appropriate, comparison is moot because the use of memset would result in an incorrect program.

You can easily check for yourself using tools such as godbolt (and many others):

for example, on gcc6.2 these two functions generate literally identical code with optimisation level -O3:

#include <algorithm>
#include <cstring>

__attribute__((noinline))
  void test1(int (&x) [100])
{
  std::fill_n(&x[0], 100, 0);
}

__attribute__((noinline))
  void test2(int (&x) [100])
{
  std::memset(&x[0], 0, 100 * sizeof(int));
}

int main()
{
  int x[100];
  test1(x);
  test2(x);
}

https://godbolt.org/g/JIwI5l