What is the purpose of the new C23 #embed directive?

6.6k views Asked by At

A new preprocessor directive is available in the upcoming C23 Standard: #embed

Here is a simple example:

// Placing a small image resource.

#include <stddef.h>

void show_icon(const unsigned char *, size_t);

int main (int, char*[]) {
    static const unsigned char icon_data[] = {
#embed "black_sheep.ico"
    };
    show_icon(icon_data, sizeof(icon_data));
    return 0;
}

Here is a more elaborate one, initializing non arrays from binary data (whatever that means):

int main() {
    /* Braces may be kept or elided as per normal initialization rules */
    int i = {
#embed "i.dat"
    }; /* i value is [0, 2^(embed element width)) first entry */
    int i2 =
#embed "i.dat"
    ; /* valid if i.dat produces 1 value, i2 value is [0, 2^(embed element width)) */
    struct s {
        double a, b, c;
        struct { double e, f, g; };
        double h, i, j;
    };
    struct s x = {
        /* initializes each element in order according to 
           initialization rules with comma-separated list
           of integer constant expressions inside of braces
         */
#embed "s.dat"
   };
   return 0;
}

What is the purpose of adding this to the C language?

1

There are 1 answers

14
chqrlie On BEST ANSWER

#embed allows easy inclusion of binary data in a program executable image, as arrays of unsigned char or other types, without the need for an external script run from a Makefile. Most compilers are very inefficient at parsing such arrays, with a notable exception: tcc.

Embedding binary or even textual data offers benefits over reading from files:

  • there might not be a file system
  • the path to the files might be non obvious
  • the files could be missing or inaccessible

The main reason for adding this to the C language seems to be the new urge to dump upon C every trendy C++ feature in a vain attempt to converge C toward a common subset of both languages. The C++ committee was strongly in favor on this extension whereas the C committee was less thrilled.

Read the details in: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20embed.html

It look 30 years for strdup() to make it into the Standard library and all of a sudden C23 gladly extends the language by 50% in all directions with no remorse.

The rationale for making this a preprocessor kludge is highly questionable and the last reason speaks for itself:

Finally, Microsoft has an ABI problem with its maximum string literal size that cannot be solved using string literals or anything treated like string literals

The specification for #embed is if full of quirks and shortcomings. The reluctance at writing proper scripts leads to abominations such as:

const unsigned char null_terminated_file_data[] = {
    #embed "might_be_empty.txt" \
        prefix(0xEF, 0xBB, 0xBF, ) /* UTF-8 BOM */ \
        suffix(,)
    0 // always null-terminated
};

Or worse:

int main () {
#define SOME_CONSTANT 0
    return
#embed </dev/urandom> if_empty(0) limit(SOME_CONSTANT)
    ;
}

A simple data description and manipulation language to assemble binary files into linkable objects and resources would have been less intrusive and easy to include in existing build systems for all languages and more importantly all existing compilers.

The paper enumerates interesting examples where #embed may come in handy, but a more general solution seems possible.