json_encode faster than Avro encoder in PHP (benchmark)

105 views Asked by At

So everywhere I've been reading about how Avro encoding is faster than JSON, but when I ran my own tests I got totally different results. In my tests, which are nothing out of the ordinary, JSON was consistently and much much faster than Avro. Am I missing something here?

This is the serialize method for Avro:

public function serialize(array $documentArray)
    {
        $this->schemaID = 1;
        $parsedSchema = AvroSchema::parse(SCHEMA);

        // check if there's already an io datum writer in the local cache, if so use it
        if (isset($this->ioDatumWriterCache[$this->schemaID])) {
            $datumWriter = $this->ioDatumWriterCache[$this->schemaID];
        } else {
            $datumWriter = new \AvroIODatumWriter($parsedSchema);
            $this->ioDatumWriterCache[$this->schemaID] = $datumWriter;
        }

        // check if there's already a binary encoder in the local cache, if so use it
        if (isset($this->binaryEncoderCache[$this->schemaID])) {
            $encoder = $this->binaryEncoderCache[$this->schemaID];
        } else {
            $encoder = new \AvroIOBinaryEncoder($this->ioWriter);
            $this->binaryEncoderCache[$this->schemaID] = $encoder;
        }

        $datumWriter->write($documentArray, $encoder);

        return $this->ioWriter->string();
    }

and this is the one for JSON:

public function serialize(array $documentArray): string
    {
        try {
            return json_encode($documentArray, JSON_THROW_ON_ERROR);
        } catch (\Exception $ex) {
            throw new ObjectIOException('JSON decode exception', 0, $ex);
        }
    }

I'm using phpbench to run the benchmarks.

The results (serializing the same set of data for both) are:

JSON Encoding Average Times: 1.685μs 1.642μs 1.637μs 1.642μs 1.637μs

Avro Encoding Average Times: 40.574μs 40.716μs 40.664μs 40.480μs 40.583μs

1

There are 1 answers

0
cquezel On

The language you are using and the data you are serializing makes a difference.

Your code systematically parse the schema every time you serialize! Is this essential?

In PHP json_encode is a language library function which is very fast. The Avro code is a PHP library (not nearly as fast).

What is a document array? The time used to convert integers, doubles ... to a JSON string is not negligible. On the other end, the time to convert the JSON string back to intergers and doubles is also not negligible.

In most compiled languages Avro serialisation would probably be faster than JSON. It is always more compact.