Apache Avro C++ serialization framework (native type flexibility)

86 views Asked by At

Apache Avro usage for C++ stub generation, native types to c++ (e.g. char[max], bool, etc).

As an example I would like to produce given header stub file:

struct CommonStruct
{ 
    int32_t SeqNum;
    char Type[17];
    char InstType;
    // ... etc
};

Using apache avro schema, what should I specify? I have tried various different settings, but none of them gave 'char', or 'char[]' types. I'm not confident, that 'string' type is not introducing overhead in processing (heap memory allocation), and whether uint8 (unsigned integer 1 byte) can really replace 'char' type?

#1. Example: input:

{
    "type": "record",
    "name": "CommonStruct",
    "fields": [
        {
            "name": "SeqNum",
            "type": "int"
        },
        {
            "name": "Type",
            "type": { "type": "fixed", "name": "char", "items": "string", "size": 16}
            
        },
        {
            "name": "InstType",
            "type": "bytes"
        }
    ]
}

output:

struct CommonStruct {
    int32_t SeqNum;
    std::array<uint8_t, 16> Type;
    std::vector<uint8_t> InstType;
    CommonStruct() :
        SeqNum(int32_t()),
        Type(std::array<uint8_t, 16>()),
        InstType(std::vector<uint8_t>())
        { }
};

#2 example: input:

{
    "type": "record",
    "name": "CommonStruct",
    "fields": [
        {
            "name": "SeqNum",
            "type": "int"
        },
        {
            "name": "Type",
            "type": { "type": "fixed", "name": "Type", "items": "string", "size": 17}
            
        },
        {
            "name": "InstType",
            "type": { "type": "fixed", "name": "InstType", "size": 1}
        }
    ]
}

output:

struct CommonStruct {
    int32_t SeqNum;
    std::array<uint8_t, 17> Type;
    std::array<uint8_t, 1> InstType;
    CommonStruct() :
        SeqNum(int32_t()),
        Type(std::array<uint8_t, 17>()),
        InstType(std::array<uint8_t, 1>())
        { }
};

My question therefore is: Is Apache Avro can replicate native C++ types? If it is not possible: a. Is there an overhead for using 'std::string' instead of char of fixed type? string surely likes to allocate heap memory, but some articles refer that it does allocation statically for short length. b. Instead of using 'std::string' I see, that it is possible to use std::array<uint8,size> of arbitrary size (above examples), I have not yet experimented of paring encoding/decoding char array sizes to there, but if anyone has an experience in this, would be great to hear, if that is equivalent: 'char[17]' == 'std::array<uint8,17>'? Would conversion practically cause any issues? Would 'std::array<uint8,1>' == 'char' in the above scenario?

Just to add, I am new to Apache Avro serialization, previously have worked only with direct serialization & looking to understand, if avro could potentially be more useful than creating own serialization & de-serialization approach.

I have installed Apache Avro from https://dlcdn.apache.org/avro/avro-1.11.2/cpp/, attempted to generate expected header files (stubs), would have expected to have flexibility to specify native C++ types (e.g. char array of fixed type), but can't find if that's supported.

0

There are 0 answers