rapidjson with gzstream lib last char '-1'

388 views Asked by At

I've written simple wrapper of gzstream 1.5 for using with rapidjson 0.1 (ios, xcode 6.1).

Problem: I have to check for eof in Peek() and Take(). Otherwise, I get '\377' (-1) as last character. I know that it's returned by std::basic_stream::get() at eof.

What's more elegant, proper and clean solution?

class GzOutStream {
public:
    GzOutStream(std::string filename) : gs_(new ogzstream(filename.c_str())) {}
    bool Good() { return gs_->good(); }
    void Close() { delete gs_; gs_ = nullptr; }
    size_t Tell() { return gs_->tellp(); }
    void Put(char c) { gs_->put(c); }

    // Not implemented
    char* PutBegin() { return 0; }
    size_t PutEnd(char*) { return 0; }

private:
    ogzstream* gs_;
};

class GzInStream {
public:
    GzInStream(std::string filename) : gs_(new igzstream(filename.c_str())) {}
    bool Good() { return gs_->good(); }
    void Close() { delete gs_; gs_ = nullptr; }
    char Peek() { return gs_->eof()? '\0' : gs_->peek(); }
    char Take() { return gs_->eof()? '\0' : gs_->get(); }
    size_t Tell() { return gs_->tellg(); }
    void Put(char c) { } // Stab

    // Not implemented
    char* PutBegin() { return 0; }
    size_t PutEnd(char*) { return 0; }

private:
    igzstream* gs_;
};
1

There are 1 answers

2
Michael Karcher On BEST ANSWER

The answer below is provided for general discussion of the problem at hand. I did not look up the behaviour of rapidjson at that point.

Your class is intended to be glue logic between a gzip input stream and rapidjson, so you have to implement the interface expected by rapidjson. It does not even have a Good function. The interface expected by rapidjson is returning '\0' on EOF, so that's the only choice you have to do. In case the gzip stream classes you use are implementing the C++ stream model, you could use the pattern described in https://github.com/miloyip/rapidjson/blob/master/doc/stream.md in the section "Example istream wrapper", which does EOF detection in a way that generally works with C++ iostreams. If your current way works fine with the gz streams, you could also keep it as it is.


You are essentially hitting the problem that the input stream stays good as long as you did not try to go past eof. The interface of GzInStream does not provide any possibility to the user to detect whether EOF has been hit before a Peek or Take returned an invalid value. This is due to the design of C++ iostreams: Most times, lowlevel APIs do not indicate "end of stream" unless you try to go past it, and so the high-level API does not provide this facility, as it is non-trivial to implement in many (non-file) cases.

The peek() and get() functions of standard C++ iostreams return int instead of char for a reason: They are specified to return the byte read from the stream as positive quantity (0..255 on systems with 8 bit bytes), while returning eof (-1) on error. Your Peek and Get functions are unable to return 256 different bytes and EOF as distinct return values, as 257 possibilities are unrepresentable by char. So as it stands, clients of your interface have to ask "Good()" after getting a character from Peek or Take, to find out whether there really was a character to get. If clients of your interface do that, it does not matter, whether you return '\377' or '\0' or any other value, as that value is going to be ignored anyway. A client using that "extra" bytes is (in my oppinion) buggy, unless it is designed to ignore the spurious NUL byte you return.

You can fix this in different ways

  • Fix your clients as indicated above, and document behaviour of that class.
  • Have Good() return gs_->good() && !gs_->eof(), relying on gs_->eof() being true before overreading eof
  • Return an int from Peek and Take, like standard iostream does.
  • Return a boost::optional from Peek and Take, and return boost::none if eof has been encountered
  • Throw an exception form Peek and Take in case of EOF.

Most people will immediately reject the last proposed fix, because it violates the "exceptions should not be used for flow control" rule. I agree that it is really bad style to force the client to use exception handling for decting EOF, but this is in fact the only possibility that does not need changes to the signature of Peek and Take nor change semantics of other functions. I expect the second suggestion (changing Good) is the way to go in you use case.