I am working on a lexer. I have a Token struct, which looks like this:
struct Token {
enum class Type { ... };
Type type;
std::string_view lexeme;
}
The Token's lexeme is just a view to a small piece of the full source code (which, by the way, is also std::string_view).
The problem is that I need to re-map special characters (for instance, '\n'). Storing them as-is isn't a nice solution.
I've tried replacing lexeme's type with std::variant<std::string, std::string_view>, but it has quickly become spaghetti code, as every time I want to read the lexeme (for example, to check if the type is Bool and lexeme is "true") it's a big pain.
Storing lexeme as an owning string won't solve the problem.
By the way, I use C++20; maybe there is a nice solution for it?
You could just use
std::stringFirstly, a
std::stringcould be used in aTokenjust as well as astd::string_view. This might not be as costly as you think, becausestd::stringin all C++ standard libraries has SSOs (small string optimizations).This means that short tokens like
"const"wouldn't be allocated on the heap; the characters would be stored directly inside the container. Before bothering withstd::string_viewandstd::variant, you might want to measure whether allocations are even being a performance issue. Otherwise, this is a case of premature optimization.If you insist on
std::variant...User @Homer512 has provided a solid solution already. Rather than using the
std::variantdirectly, you could create a wrapper around it which provides a string-like interface for bothstd::stringandstd::string_view.This is easy to do, because the name and meaning of most member functions is identical for both classes. That also makes them easy to use through
std::visit.