string literals as template arguments forces code duplication and verbosity

224 views Asked by At

A compile-time created string is being used as a character array in a structure and its max size is driven by that string. Also that structure is being used as a member variable in another struct, and there are multiple of such string buffer structs in it, which in turn drives its storage size.

Using strings as template arguments does work, but introduces multiple code repetitions, spreading like wild fire through the source code. Matters get even worse when the first structure has multiple strings as template arguments (2 or even 3).

Here's a working example:

#include <cstddef>

static constinit decltype(auto) c = "abc";
//static constinit decltype(auto) c2 = "12";

// size of A is driven by compile-time-created strings
template<typename T, size_t N /*, typename T2, size_t N2*/>
struct A{
    char str[N] {};
//    char str2[N2] {};

    consteval A(const T(&s)[N] /*, const T2(&s2)[N2]*/) {
        for(int i = 0; i < N; ++i){
            str[i] = s[i];
            if(s[i] == 0) // for strings shorter than storage
                break;
        }
//        for(int i = 0; i < N2; ++i){
//            str2[i] = s2[i];
//            if(s2[i] == 0)
//                break;
//        }
    }
};

// dummy function which's sole purpose is to help with the type deduction
template<typename T, size_t N /*, typename T2, size_t N2*/>
consteval auto f(const T(&s)[N] /*, const T2(&s2)[N2]*/){
    return A<T, N /*, T2, N2*/>(s /*, s2*/);
}

// size of B's members and struct's total size are driven by compile-time
// created strings
struct B{
    // explicit (manual) error-prone template params
    A<char, 4 /*, char, 3*/>      xxx{c /*, c2*/};
    // use of a dummy function feels very wrong
    decltype(f(c /*, c2*/))       yyy{c /*, c2*/};
    // also uses dummy function
    decltype(f("abc" /*, "12"*/)) zzz{"abc" /*, "12"*/};
    // would like to be able to use shorter strings
  //A<char, 4 /*, char, 3*/>      fail{"a" /*, "1"*/};
};

The 3 working examples of usage are either prone to mistakes or introduce too much code repetition. And the last one, that is not working, is a nice-to-have sort of thing along the lines of "I wonder if this is doable?" to me.

Link to Compiler Explorer

Is there anything I'm missing in the C++ standard that would let me avoid repetition while retaining this automation to avoid potential manual mistakes?

Ideally I'd've loved to just code like this snippet below:

struct B{
    // automatic type deduction without some dummy function's help
    A xxx{c /*, c2*/};
    // strings in multiple TUs produce independent templates in current standard :(
    A<"abc" /*, c2*/> xxx;
};

// usage

// constructs with { {c /*, c2*/}, {"abc" /*, c2*/} }
constinit B b;

// constructs with { {c /*, c2*/}, {"bc" /*, "0"*/} }
constinit B b2{{}, {"bc" /*, "0"*/}};
1

There are 1 answers

1
Patrick Roberts On

Without using standard library headers (excepting <cstddef>), here's a definition for the class template A that gets most of the desired features with as little boilerplate as possible, given that C++ disallows automatic argument deduction in non-static struct member declarations. This uses a user-defined deduction guide:

#include <cstddef>

template <typename T, std::size_t N>
struct A {
  // + 1 to account for null termination in storage
  T str[N + 1];

  template <std::size_t... Ns>
  // to prevent aggregate construction from compiling
  // if sum of string literal sizes exceed storage capacity
    requires(N + 1 > (0 + ... + (Ns - 1)))
  consteval A(const T (&...s)[Ns]) {
    auto it = str;
    (..., [&](const auto &r) {
      for (const auto c : r) {
        if (!c) break;
        *it++ = c;
      }
    }(s));
    *it = T{};
  }
};

// user-defined deduction guide
template <typename T, std::size_t... Ns>
// - 1 to exclude null termination from each string literal in the pack
A(const T (&...s)[Ns]) -> A<T, (0 + ... + (Ns - 1))>;

Usage and tests below, with obligatory Compiler Explorer link:

static constinit decltype(auto) c1 = "abc";
static constinit decltype(auto) c2 = "12";

struct B {
  // no dummy function
  // but requires repetition of the compile-time constants
  // in the type expression
  decltype(A{c1, c2}) value{c1, c2};
};

#include <concepts>

// deduces correct width from compile-time constants
static_assert(std::same_as<decltype(B{}.value), A<char, 5>>);

#include <string_view>

using namespace std::string_view_literals;

// default construction
static_assert(B{}.value.str == "abc12"sv);
// aggregate construction
static_assert(B{{}}.value.str == ""sv);
static_assert(B{{c1}}.value.str == "abc"sv);
static_assert(B{{c2}}.value.str == "12"sv);
static_assert(B{{c2, c1}}.value.str == "12abc"sv);
static_assert(B{{"a", "1"}}.value.str == "a1"sv);
// candidate template ignored: constraints not satisfied [with Ns = <3, 5>]
// because '5UL + 1 > 0 + (3UL - 1) + (5UL - 1)' (6 > 6) evaluated to false
// static_assert(B{{"ab", "1234"}}.value.str == "ab1234"sv);

To completely eliminate duplication in the declaration, you can create a derived class that behaves like A{c1, c2}. To make the linkage well-behaved, you'll need to canonicalize the template arguments of the derived class. Here's one way to do that:

template <typename T, T... Cs>
struct C : decltype(A{{Cs..., T{}}}) {
  using base = decltype(A{{Cs..., T{}}});
  using base::base;

  constexpr C() : base{{Cs..., T{}}} {}
};

#include <utility>

template <A, typename...>
struct to_c;

template <typename T, std::size_t N, A<T, N> a>
struct to_c<a> : to_c<a, std::make_index_sequence<N>> {};

template <typename T, std::size_t N, A<T, N> a, std::size_t... Is>
struct to_c<a, std::index_sequence<Is...>> {
  using type = C<T, a.str[Is]...>;
};

template <A... a>
using to_c_t = typename to_c<A{a.str...}>::type;

static constinit decltype(auto) c1 = "abc";
static constinit decltype(auto) c2 = "12";

#include <concepts>

static_assert(std::same_as<to_c_t<c1, c2>, C<char, 'a', 'b', 'c', '1', '2'>>);

For your purposes, to_c_t<c1, c2> value; should behave almost like decltype(A{c1, c2}) value{c1, c2};, except there's no duplication, and it is safe to use across translation units since it is an alias for the canonical type C<char, 'a', 'b', 'c', '1', '2'>. Just for completeness, here's the full example using this approach.