Recording failed fuzz tests for re-execution later

193 views Asked by At

I am looking for a better way to save interesting results from fuzz testing to repeat later. The current plan is to serialise the failing input and write it out as a test case. Say we want to test:

int function_under_test(struct arbitrary *);

Say this function is run a few thousand times with semi-random data in the arbitrary struct and fails twice. I would like to be able to reproduce the failing cases in order to determine whether bug fixes were successful.

I can see two strategies for achieving this:

  1. Force determinism - store the random seed, note down which test number failed
  2. Store the struct that caused failure somewhere persistent

Option 1 requires sufficient care taken over how the fuzz test cases are generated to allow reproducibility and some messing around with reseeding the random number generator. Option 2 requires some tedious code generation, possibly reduced by a serialisation library. I am hoping to hear that there is a third option that I cannot see.

My plan for option 2 is essentially to serialise the struct to an array of characters, write said array to a text file, then read it back in and convert to the struct on demand. The following is a proof-of-concept for POD, more complex types require more sophisticated serialisation. The example is C, but answers based on C++ are welcome too.

#include <stdio.h> /* for printf in main() */
#include <stddef.h> /* size_t */

#define SIZEFOO 5
struct arbitrary 
{
  int foo[SIZEFOO];
  double bar;
};

int equal_arbitrary(struct arbitrary *lhs, struct arbitrary *rhs)
{
  /* return 1 if structs are equal, 0 if not */
  for (int i=0; i < SIZEFOO; i++)
  {
    if (lhs->foo[i] != rhs->foo[i]) {return 0;}
  }
  if (lhs->bar != rhs->bar) {return 0;}
  return 1;
}

void arbitrary_to_bytes(struct arbitrary *input, char *output)
{
  union
  {
    struct arbitrary data;
    unsigned char view[sizeof(struct arbitrary)];
  } local;
  size_t i;

  local.data = *input;
  for (i = 0; i < sizeof(struct arbitrary); i++)
    {
      output[i] = local.view[i];
    }
  return;
}

void bytes_to_arbitrary(char *input, struct arbitrary *output)
{
  union
  {
    struct arbitrary data;
    unsigned char view[sizeof(struct arbitrary)];
  } local;
  size_t i;

  for (i = 0; i < sizeof(struct arbitrary); i++)
    {
      local.view[i] = input[i];
    }

  *output = local.data;
  return;
}

int main(void)
{
  struct arbitrary original;
  struct arbitrary copied;
  unsigned char working[sizeof (struct arbitrary)];

  for (int i=0; i < SIZEFOO; i++) { original.foo[i] = 3 + i*i; }
  original.bar = 3.14;

  arbitrary_to_bytes(&original,working);
  bytes_to_arbitrary(working,&copied);

  if (equal_arbitrary(&original,&copied))
    {
      printf("PASS\n");
    }
  else
    {
      printf("FAIL\n");
    }

  return 0;
}

During execution, when a (fuzz) test case fails, one of the side effects would be to convert the input structure to a byte array (as above) and write out something like the following to a file that can subsequently become part of the deterministic test suite:

result function_under_test_fuzz_123(void) /* 123 a unique number */
{
  int rc;
  struct arbitrary;
  unsigned char test_data[] = "byte array stored as ascii string";

  bytes_to_arbitrary(test_data, &arbitrary);

  rc = function_under_test(&arbitrary);

  /* Do whatever is needed to determine if the function failed */
  if (rc) {return PASS;} else {return FAIL;}
}

I believe that type punning through a union is valid provided the other value is a char array. When the function inputs are not plain old data, the (de)serialisation step becomes rather more complicated but the general strategy remains usable. Storing the test case as ascii could yield some rather large text files.

Before spending the time setting up the infrastructure to do the above (mostly code generators, some modification to the test framework) it seems worth asking the community if there's a known better way. I doubt I'm the first person to think reproducible fuzz testing is a good idea!

Changing to C++ means templates instead of external code generators and finding a stl-friendly serialisation library in BOOST, but doesn't change the fundemental query of "how best to save failed fuzz testing results?".

Thank you

1

There are 1 answers

4
Bartek Banachewicz On BEST ANSWER

If you use C++11 (and above) <random> header (or Boost.Random), storing a seed is enough to guarantee determinism, provided you use the same generators. some messing around with reseeding the random number generator. is really quite simple in practice, provided you read the manuals.

Otherwise the question is basically "how to serialize data", which has so many answers over the internet in so many forms that it's not even remotely sensible to outline them here. The fact whether they are results of fuzz testing or distribution of cereal brands over USA doesn't really matter here.