Custom user defined type for postgres not working with variable length text

320 views Asked by At

Overview

I'm trying to create a simple custom user defined type for postgres using C; however, whenever I query the table with the custom type, the data appears to be getting truncated (or there is an alignment issue). I believe I'm doing something wrong with handling the variable nature of the input.

Here is my code:

PG_FUNCTION_INFO_V1(hierarchy_in);
PG_FUNCTION_INFO_V1(hierarchy_out);

typedef struct Hierarchy
{
    int32 length;
    char path[FLEXIBLE_ARRAY_MEMBER];
} Hierarchy;


Datum
hierarchy_in(PG_FUNCTION_ARGS)
{
    char *input_str = PG_GETARG_CSTRING(0);
    int32 input_len = strlen(input_str);
    Hierarchy *result;

    result = (Hierarchy *)palloc(VARHDRSZ + input_len);
    SET_VARSIZE(result, VARHDRSZ + input_len);
    strncpy(result->path, input_str, input_len);

    PG_RETURN_POINTER(result);
}

Datum
hierarchy_out(PG_FUNCTION_ARGS)
{
    Hierarchy *input = (Hierarchy *)PG_GETARG_POINTER(0);
    char *result;
    int32 input_len = VARSIZE(input) - VARHDRSZ;

    result = pnstrdup(input->path, input_len);

    PG_RETURN_CSTRING(result);
}

Here is my test case:

DROP TABLE TESTING;
DROP EXTENSION hierarchy CASCADE;

CREATE EXTENSION hierarchy;

CREATE TABLE TESTING (
    id SERIAL PRIMARY KEY,
    position hierarchy 
);

INSERT INTO TESTING (position) VALUES ('0123456789');
INSERT INTO TESTING (position) VALUES ('Hello World');

SELECT * FROM TESTING;

Here is my output:

enter image description here

1

There are 1 answers

6
VonC On

It is like the null-termination of your strings is missing.
While that would work, there would still be missing characters at the beginning.
Since PostgreSQL data type expects the first bytes to store the length of the variable-length data, the code would need to be:

typedef struct Hierarchy
{
    int32 length;
    char path[FLEXIBLE_ARRAY_MEMBER];
} Hierarchy;

Datum
hierarchy_in(PG_FUNCTION_ARGS)
{
    char *input_str = PG_GETARG_CSTRING(0);
    int32 input_len = strlen(input_str);
    Hierarchy *result;

    result = (Hierarchy *)palloc(VARHDRSZ + sizeof(int32) + input_len); // Allocate space for length field
    SET_VARSIZE(result, VARHDRSZ + sizeof(int32) + input_len);
    result->length = input_len;
    memcpy(result->path, input_str, input_len); // Use memcpy without copying the null-terminator

    PG_RETURN_POINTER(result);
}

Datum
hierarchy_out(PG_FUNCTION_ARGS)
{
    Hierarchy *input = (Hierarchy *)PG_GETARG_POINTER(0);
    char *result;
    int32 input_len = input->length;

    result = (char *)palloc(input_len + 1); // Allocate memory for the string plus the null-terminator
    memcpy(result, input->path, input_len); // Copy the data without the null-terminator
    result[input_len] = '\0'; // Manually add the null-terminator

    PG_RETURN_CSTRING(result);
}

From "User-Defined Types", the VARSIZE and SET_VARSIZE macros provided by PostgreSQL are used to handle the header portion.

The hierarchy_in() function already reads the input string without the null-terminator, stores the length of the string in the length field of the Hierarchy struct, and uses memcpy to copy the input string without the null-terminator to result->path.

However, the hierarchy_out() function creates a new null-terminated C-string from the extracted data and returns it as a C-string.
That should enable you to print said C-String, with all its expected characters.