Why is std::codecvt only used by file I/O streams?

494 views Asked by At

I've been implementing a codecvt for handling indentiation of output streams. It can be used like this and works fine:

std::cout << indenter::push << "im indentet" << indenter::pop << "\n im not..."

However, while I can imbue an std::codecvt to any std::ostream I was very confused when I found out that my code worked with std::cout as well as std::ofstream, but not for example for std::ostringstream even while all of which inherit from the base class std::ostream.

The facet is constructed normally, the code compiles, it doesn't throw any exceptions... It's just that none of the member functions of the std::codecvt are called.

For me that is very confusing and I had to spend a lot of time figuring out that std::codecvt won't do anything on non file I/O streams.

Is there any reason std::codecvt is not being used by all classes inherited by std::ostream?

Furthermore does anyone have an idea on which structs I could fall back on to implement the indenter?

Edit: this is the part of the language I'm referring to:

All file I/O operations performed through std::basic_fstream use the std::codecvt<CharT, char, std::mbstate_t> facet of the locale imbued in the stream.

Source: https://en.cppreference.com/w/cpp/locale/codecvt


Update 1:

I've made a small example illustrating my problem:

#include <iostream>
#include <locale>
#include <fstream>
#include <sstream>

static auto invocation_counter = 0u;

struct custom_facet : std::codecvt<char, char, std::mbstate_t>
{
  using parent_t = std::codecvt<char, char, std::mbstate_t>;

  custom_facet() : parent_t(std::size_t { 0u }) {}

  using parent_t::intern_type;
  using parent_t::extern_type;
  using parent_t::state_type;

  virtual std::codecvt_base::result do_out (state_type& state, const intern_type* from, const intern_type* from_end, const intern_type*& from_next,
                                                               extern_type* to, extern_type* to_end, extern_type*& to_next) const override
  {
    while (from < from_end && to < to_end)
    {
      *to = *from;

      to++;
      from++;
    }

    invocation_counter++;

    from_next = from;
    to_next = to;

    return std::codecvt_base::noconv;
  }

  virtual bool do_always_noconv() const throw() override
  {
    return false;
  }
};

std::ostream& imbueFacet (std::ostream& ostream)
{
  ostream.imbue(std::locale { ostream.getloc(), new custom_facet{} });

  return ostream;
}

int main()
{
  std::ios::sync_with_stdio(false);

  std::cout << "invocation_counter = " << invocation_counter << "\n";

  {
    auto ofstream = std::ofstream { "testFile.txt" };

    ofstream << imbueFacet << "test\n";
  }

  std::cout << "invocation_counter = " << invocation_counter << "\n";

  {
     auto osstream = std::ostringstream {};

     osstream << imbueFacet << "test\n";
  }

  std::cout << "invocation_counter = " << invocation_counter << "\n";
}

I would except invocation_counter to increase after streaming in the std::ostringstream, but it doesn't.


Update 2:

After more research I found out that I could use std::wbuffer_converter. To quote https://en.cppreference.com/w/cpp/locale/wbuffer_convert

std::wbuffer_convert is a wrapper over stream buffer of type std::basic_streambuf<char> which gives it the appearance of std::basic_streambuf<Elem>. All I/O performed through std::wbuffer_convert undergoes character conversion as defined by the facet Codecvt. [...]

This class template makes the implicit character conversion functionality of std::basic_filebuf available for any std::basic_streambuf.

This way I can apply a facet to a std::ostringstream:

auto osstream = std::ostringstream {};

osstream << "test\n";
  
auto facet = custom_facet{};
  
std::wstring_convert<custom_facet, char> conv;
  
auto str = conv.to_bytes(osstream.str());

However, I lose the ability to concate facets using the streaming operator <<.

This confuses me even more why the std::codecvt is not implicity used by ALL output streams. All output streams inherit from std::basic_streambuf whose interface is suitable to using std::codecvt, which is just using an input and an output character sequence, fully implemented in std::basic_streambuf.

So why is the parsing of std::codecvt implemented in std::basic_filebuf instead of std::basic_streambuf? std::basic_filebuf inherits std::basic_streambuf after all...

Either I have some fundamental misunderstanding on how streams work in C++ or std::codecvt is poorly integrated in the standard. Maybe this is why it is marked as deprecated?

1

There are 1 answers

0
Jan Gabriel On BEST ANSWER

The std::codecvt facet was originally intended to handle I/O conversions between disk and memory character representation. Quoted from paragraph 39.4.6 of Bjarne Stroustrup's The C++ Programming Language fourth edition:

Sometimes, the representation of characters stored in a file differs from the desired representation of those same characters in main memory. ... the codecvt facet provides a mechanism for converting characters from one representation to another as they are read or written.

The intended purpose was thus to use std::codecvt only for adapting characters between file (disk) and memory, which partly answers your question:

Why is std::codecvt only used by file I/O streams?

From the docs we see that:

All file I/O operations performed through std::basic_fstream<CharT> use the std::codecvt<CharT, char, std::mbstate_t> facet of the locale imbued in the stream.

Which then answers the question why std::ofstream (uses a file-based streambuffer) and std::cout (linked to standard output FILE stream) invokes std::codecvt.

Now, to use the high-level std::ostream interface you need to provide an underlying streambuf. The std::ofstream provides a filebuf and the std::ostringstream provides a stringbuf (which is not linked to the use of std::codecvt). See this post over the streams, which also highlights the following:

...in the case of ofstream, there are also a few extra functions which forward to additional functions in the filebuf interface

But, to invoke the character conversion functionality of a std::codecvt when you have a std::ostringstream which is a std::ostream with an underlying std::basic_streambuf you can use, as indicated in your post, the std::wbuffer_convert.

You have only used the std::wstring_convert in your second update and not the std::wbuffer_convert.

When using the std::wbuffer_convert you can wrap the original std::ostringstream with a std::ostream as follows:

// Create a std::ostringstream
auto osstream = std::ostringstream{};

// Create the wrapper for the ostringstream
std::wbuffer_convert<custom_facet, char> wrapper(osstream.rdbuf());

// Now create a std::ostream which uses the wrapper to send data to
// the original std::ostringstream
std::ostream normal_ostream(&wrapper);
normal_ostream << "test\n";

// Flush the stream to invoke the conversion
normal_ostream << std::flush;

// Check the invocation_counter
std::cout << "invocation_counter after wrapping std::ostringstream with "
                "std::wbuffer_convert = "
            << invocation_counter << "\n";

Together with the complete example here, the output would be:

invocation_counter start of test1 = 0
invocation_counter after std::ofstream = 1
> test printed to std::cout
invocation_counter after std::cout = 2
invocation_counter after std::ostringstream (should not have changed)= 2
ic after test1 = 2
invocation_counter after std::ostringstream with std::wstring_convert = 3
ic after test2 = 3
invocation_counter after wrapping std::ostringstream with std::wbuffer_convert = 4
ic after test3 = 4

Conclusion

std::codecvt was intended for converting between disk and memory representation. That is why the std::codecvt implementation is only called with streams using an underlying filebuf such as std::ofstream and std::cout. However, a stream using an underlying stringbuf can be wrapped using std::wbuffer_convert into a std::ostream instance which would then invoke the underlying std::codecvt.