Cross platform programming question (file I/O)

1k views Asked by At

I have a C++ class that looks a bit like this:

class BinaryStream : private std::iostream
{
    public:
        explicit BinaryStream(const std::string& file_name);
        bool read();
        bool write();

    private:
        Header m_hdr;
        std::vector<Row> m_rows;        
}

This class reads and writes data in a binary format, to disk. I am not using any platform specific coding - relying instead on the STL. I have succesfully compiled on XP. I am wondering if I can FTP the files written on the XP platform and read them on my Linux machine (once I recompile the binary stream library on Linux).

Summary:

  1. Files created on Xp machine using a cross platform library coompiled for XP.
  2. Compile the same library (used in 1 above) on a Linux machine

Question: Can files created in 1 above, be read on a Linux machine (2) ?

If no, please explain why not, and how I may get around this issue.

7

There are 7 answers

2
dutt On

As long as it's plain binary files it should work

0
Adam Maras On

Because you're using the STL for everything, there's no reason your program shouldn't be able to read the files on a different platform.

6
Omnifarious On

This depends entirely on the specifics of the binary encoding. One thing that's different about Linux vs. XP is that you're much more likely to find yourself on a big-endian platform, and if your binary encoding is endian specific you'll end up with issues.

You may also end up with issues relating to the end-of-line character. There isn't enough information here about how you're using ::std::iostream to give you a good answer to this question.

I would strongly suggest looking at the protobuf library. It is an excellent library for creating fast cross-platform binary encodings.

0
dirkgently On

Derive from std::basic_streambuf. That's what they are there for. Note, most STL classes are not designed to be derived from. The one I mention is an exception.

4
reko_t On

If you want that your code is portable across machines with different endianess, you need to stick to using one endianess in your files. Whenever you read or write files, you do conversions between the host byte order, and the file byte order. It's common to use what you call network byte order when you want to write files that are portable across all machines. Network byte order is defined to be big endian, and there are pre-made functions made to deal with those conversions (although they are very easy to write yourself).

For example, before writing a long to a file, you should convert it to network byte order using htonl(), and when reading from a file you should convert it back to host byte order with ntohl(). On big-endian system htonl() and ntohl() simply return the same number as passed to the function, but on little-endian system it swaps each byte in the variable.

If you don't care about supporting big-endian systems, none of this is an issue though, although it's still good practice.

Another important thing to pay attention to is padding of your structs/classes that you write, if you write them directly to the file (eg. Header and Row). Different compilers on different platforms can use different padding, which means that variables are aligned differently in the memory. This can break things big-time, if the compilers you use on different platform use different padding. So for structs that you intend to write directly to files/other streams, you should always specify padding. You should tell the compiler to pack your structs like this:

#pragma pack(push, 1)
struct Header {
  // This struct uses 1-byte padding
  ...
};
#pragma pack(pop)

Remember that doing this will make using the struct more inefficient when you use it in your application, because access to unaligned memory addresses means more work for the system. This is why it's generally a good idea to have separate types for the packed structs that you write to streams, and a type that you actually use in the application (you just copy the members from one to other).

EDIT. Another way to deal with the issue, of course, is to serialize those structs yourself, which won't require using #pragma (pragmas are compiler-dependent feature, although all major compilers to my knowledge supports the pragma pack).

0
AudioBubble On

Here is an article Endianness that is related to your question. Look for "Endianness in files and byte swap". Briefly if If your Linux machine has the same endianes than it's OK, if not - there migth be problems.

For example when integer 1 is written in file on XP it looks like this: 10 00

But when integer 1 is written in file on machine with the other endianess it will look like this: 00 01

But if you use only one byte characters there must be no problem.

0
MarkR On

If you are writing a struct / class directly out to the disc, then don't.

This might not be compatible between different builds on the same compiler, and almost certainly will break when you move to a different platform or compiler. It will definitely break if you change to a different architecture.

It isn't clear from the above code what you're actually writing to the file.