Is this use of va_copy undefined behaviour?

80 views Asked by At

I made a function that prints pairs of keys and values, where the keys are trusted compile time literal strings which possibly can contain printf specifiers. Now my question is: is this function legal according to the c standard (like c99)? If not, can I do anything to conform?

void pairs(char *message, ...)
{
    va_list args;
    va_start(args, message);

    char *key = va_arg(args, char *);
    if (key == NULL) {
        va_end(args);
        return;
     }

    while (key != NULL) {
        if (key[0] == '%') {
            int len = strlen(key);
            printf("%s=", &key[len + 1]);

            // Copy the current state of the va_list and pass it to vprintf
            va_list args2;
            va_copy(args2, args);
            vprintf(key, args2);
            va_end(args2);

            // Consume one argument assuming vprintf consumed one
            va_arg(args, char *);
        } else {
            // if no format assume it is a string
            fprintf(_log_stream, "%s=\"%s\"", key, va_arg(args, char *));
        }

        key = va_arg(args, char *);
        if (key == NULL)
            break;

        printf(", ");
    }
    va_end(args);
}

It should work something like

    pairs("This is a beautiful value",
            "%d\0hello", 10,
            "pass", user_var,
            "hello", "bye", NULL);

And it actually outputs correctly: hello=10, pass="sus", hello="bye" So it works on my machine (using gcc). But again, is it compliant?

Edit: I found https://www.gnu.org/software/libc/manual/html_node/Variable-Arguments-Output.html

Portability Note: The value of the va_list pointer is undetermined after the call to vprintf, so you must not use va_arg after you call vprintf. Instead, you should call va_end to retire the pointer from service. You can call va_start again and begin fetching the arguments from the start of the variable argument list. (Alternatively, you can use va_copy to make a copy of the va_list pointer before calling vfprintf.) Calling vprintf does not destroy the argument list of your function, merely the particular pointer that you passed to it.

Which is the reason why I use va_copy in the first place, but I am not sure if there is a way to get around that restriction.

2

There are 2 answers

2
KamilCuk On BEST ANSWER

Is this use of va_copy undefined behaviour?

No.

is this function legal according to the c standard (like c99)?

No. The:

 key = va_arg(args, char *);

is undefined behavior. The next argument is 10, which is an int.

(Also, technically, NULL is not a char *. NULL is either an int or void *, no one knows. It is also not valid for NULL. But, nowadays NULL is universally (void *)0, and on sane systems all pointers are the same size, bottom line if you use char * no one will notice. But if you want to be correct, the argument should be (char *)NULL.)

If not, can I do anything to conform?

What you want to do in the way you want to do is not possible. You are not able to use va_list after vprintf has used it. After vprintf has used va_list you have to va_end it.

If you want to do that, you have to parse the format string yourself and get the types of arguments yourself and advance va_list yourself and either implement vprintf yourself or you could call it separately.


can I do anything

Och yea, sure, totally. So let's implement a pairs as a variadic macro that just forwards to printf.

#include <boost/preprocessor/facilities/overload.hpp>
#include <stdio.h>

#define PAIRS1_0()
#define PAIRS1_2(a, b)       " " a
#define PAIRS1_4(a, b, ...)  " " a  PAIRS1_2(__VA_ARGS__)
#define PAIRS1_6(a, b, ...)  " " a  PAIRS1_4(__VA_ARGS__)

#define PAIRS2_0()
#define PAIRS2_2(a, b)       b
#define PAIRS2_4(a, b, ...)  b, PAIRS2_2(__VA_ARGS__)
#define PAIRS2_6(a, b, ...)  b, PAIRS2_4(__VA_ARGS__)

#define pairs(pre, ...)  \
    printf(  \
        pre  \
        BOOST_PP_OVERLOAD(PAIRS1_, __VA_ARGS__)(__VA_ARGS__) \
        "\n",  \
        BOOST_PP_OVERLOAD(PAIRS2_, __VA_ARGS__)(__VA_ARGS__)  \
    )

int main() {
    pairs("This is a beautiful value",
        "hello=%d", 10,
        "hello=%s", "bye");
}

The call becomes a single printf with space joined arguments. The first overloaded macro joins the strings into one string, and then the second overloaded macro takes the second arguments. It is a simple shuffle to re-order arguments for printf.

But this is not satisfactory. You might delve into this more, by using _Generic to guess the format specifier. From the arguments, you can construct an array of "printers" that take va_list by a pointer. Not by value! You can only modify the "upper" function va_list by passing with a pointer. Then you can iterate over printers to print the values, one by one.

#include <stdio.h>
#include <stdarg.h>

// printers for pairs

void pairs_int(va_list *va) {
    const int v = va_arg(*va, int);
    printf("%d", v);
}
void pairs_charp(va_list *va) {
    char *p = va_arg(*va, char *);
    printf("%s", p);
}

#define PRINTER(x)  \
    _Generic((x) \
    , int: &pairs_int  \
    , char *: &pairs_charp \
    )

typedef void (*printer_t)(va_list *va);

// the actual logic
void pairs_in(const char *pre, const printer_t printers[], ...) {
    va_list va;
    va_start(va, printers);
    fputs(pre, stdout);
    for (const printer_t *printer = printers; printer; printer++) {
        putchar(' ');
        fputs(va_arg(va, char*), stdout);
        putchar('=');
        (*printer)(&va);
        fflush(stdout);
    }
    va_end(va);
    putchar('\n');
}

// Convert arguments into an array of printers
#define PRINTERS_0()
#define PRINTERS_2(a, b)       PRINTER(b)
#define PRINTERS_4(a, b, ...)  PRINTERS_2(a, b), PRINTERS_2(__VA_ARGS__)
#define PRINTERS_6(a, b, ...)  PRINTERS_2(a, b), PRINTERS_4(__VA_ARGS__)
#define PRINTERS_N(_6,_5,_4,_3,_2,_1,N,...)  PRINTERS##N
#define PRINTERS(...)  PRINTERS_N(__VA_ARGS__,_6,_5,_4,_3,_2,_1)(__VA_ARGS__)

// The entrypoint
#define pairs(pre, ...)  \
    pairs_in(pre, (const printer_t[]){ \
        PRINTERS(__VA_ARGS__), NULL, }\
        __VA_OPT__(,) \
        __VA_ARGS__ \
    )

int main() {
    pairs("This is a beautiful value",
        "value", 10,
        "hello", "bye");
}

Now you can expand this method, by for example including in the "hello" a format specifier like you wanted, and then extract it. For example "%d\0hello", if the string starts with %, split it on \0 and use the leading part for printing inside the printer. I think this is the interface you aimed for.

#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// printers for pairs
const char *_extract_fmt(const char *text, const char *def) {
        return text[0] == '%' ? text : def;
}

const char *_extract_name(const char *text) {
        return text[0] == '%' ? text + strlen(text) + 1 : text;
}

void pairs_int(const char *text, va_list *va) {
        const int v = va_arg(*va, int);
        printf(_extract_fmt(text, "%d"), v);
}
void pairs_charp(const char *text, va_list *va) {
        char *v = va_arg(*va, char *);
        printf(_extract_fmt(text, "%s"), v);
}

#define PRINTER(x) _Generic((x), int: &pairs_int, char *: &pairs_charp)

typedef void (*printer_t)(const char *text, va_list *va);

// the actual logic
void pairs_in(const char *pre, const printer_t printers[], ...) {
        va_list va;
        va_start(va, printers);
        fputs(pre, stdout);
        for (const printer_t *printer = printers; *printer; printer++) {
                putchar(' ');
                char *text = va_arg(va, char *);
                fputs(_extract_name(text), stdout);
                putchar('=');
                (*printer)(text, &va);
                fflush(stdout);
        }
        va_end(va);
        putchar('\n');
}

// Convert arguments into an array of printers
#define PRINTERS_0()
#define PRINTERS_2(a, b) PRINTER(b)
#define PRINTERS_4(a, b, ...) PRINTERS_2(a, b), PRINTERS_2(__VA_ARGS__)
#define PRINTERS_6(a, b, ...) PRINTERS_2(a, b), PRINTERS_4(__VA_ARGS__)
#define PRINTERS_N(_6, _5, _4, _3, _2, _1, N, ...) PRINTERS##N
#define PRINTERS(...) PRINTERS_N(__VA_ARGS__, _6, _5, _4, _3, _2, _1)(__VA_ARGS__)

// The entrypoint
#define pairs(pre, ...) \
        pairs_in(pre, (const printer_t[]){ \
                          PRINTERS(__VA_ARGS__), \
                          NULL, \
                      } __VA_OPT__(, ) __VA_ARGS__)

int main() {
        pairs("This is a beautiful value", "%10d\0value", 10, "hello", "bye");
}

Or you can expand such a library and start supporting python format strings and so on, which is what I did with now dead yio library, as I do not have time for it.

0
John Bollinger On

Is this use of va_copy undefined behaviour?

The va_copy is not itself undefined, nor is passing the copy to vprintf. Nor is using va_end on it after the function returns -- that's required.

But this is a problem:

            // Consume one argument assuming vprintf consumed one
            va_arg(args, char *);

Generally speaking, you cannot consume a variable argument without knowing its (promoted) type, or close enough to that. Compatible types are interchangeable in this regard. Corresponding signed and unsigned integer types are interchangeable with each other here, provided that the actual value is representable in both. Also, types char * and void * are interchangeable with each other here. If the type specified to va_arg is not among those alternatives, or if there aren't any more variable arguments at all, then UB results from the va_arg call.

Some of your alternatives are:

  • perform your own analysis to determine what type of argument to skip, or
  • write pairs as a function-like macro instead of a bona fide function (with use of a large dose of macro magic), or
  • reimagine the function to receive more or differently organized information that better serves your purposes.