Portable support for large files

2.4k views Asked by At

I looked at

and I still don't know how to get to know size of file larger than 4 gb in a portable way.

Notably, incorporating some of the answers failed compiling for Cygwin, while the others failed for Linux.

1

There are 1 answers

4
rr- On BEST ANSWER

Turns out there are quite a few functions defined by various standards:

  1. fseek/ftell

    It's defined by ANSI standard library. It's available virtually everywhere. It is guaranteed to work with 32-bit integers only, but it isn't required to (meaning you might get support for large files out of the box).

  2. fseeko/ftello

    This is defined by POSIX standard. On many platforms, depending on value of _FILE_OFFSET_BITS it will cause off_t to be defined as off64_t and fseeko as fseeko64 for _FILE_OFFSET_BITS=64.

  3. fseeko64/ftello64

    This is the 64-bit equivalent of fseeko and ftello. I couldn't find information on this in any standard.

    Cygwin inconsistency

    While it conforms to POSIX, I can't compile the fseeko no matter what I define under Cygwin, unless I use --std=gnu++11 which is obviously nonsense, since it's part of POSIX rather than a GNU extension. So what gives? According to this discussion:

    64 bit file access is the natural file access type for Cygwin. off_t is 8 bytes. There are no foo64 functions for that reason. Just use fopen and friends and you get 64 bit file access for free.

    This means #ifdef for cygwin on POSIX platforms.

  4. _fseeki64 / _ftelli64

    These are defined by Microsoft Visual C++ and are exclusively used with their compiler. Obviously it doesn't support anything else from the list above (other than fseek), so you're going to need #ifdefs.

    EDIT: I actually advise against using them and I'm not the only one who thinks that. I experienced literally following:

    1. wfopen a file in binary mode
    2. fwrite 10 bytes worth to it
    3. _ftelli64 the position
    4. It returns 12 rather than 10 bytes

    Looks like this is horribly broken.

  5. lseek and lseek64

    Defined by POSIX, these are to be used with integer file descriptors opened with open() from unistd.h rather than FILE* structs. These are not compatible with Windows. Again, they use off_t data type.

  6. _lseek, _lseeki64

    This is Windows equivalent of lseek/lseek64. Curiously, _lseeki64 doesn't use off_t and uses __int64 instead, so you know it'll work with big files. Neat.

  7. fsetpos/fgetpos

    While these are actually pretty portable, they're almost unusable, since they operate on opaque structures rather than integer offsets, meaning you can add or subtract them, or even navigate to certain position in file obtained by any means other than through fgetpos.

Conclusion

So to make your program portable, depending on the platform, you should use:

  • fseeko (POSIX) + define _FILE_OFFSET_BITS=64 on POSIX
  • fseek for Cygwin and for default implementation
  • _lseeki64 for Windows - or, if you manage to work your way around it - _fseeki64.

An example that uses _ftelli64:

int64_t portable_ftell(FILE *a)
{
#ifdef __CYGWIN__
    return ftell(a);
#elif defined (_WIN32)
    return _ftelli64(a);
#else
    return ftello(a);
#endif
}

In reality, instead of checking #ifdefs which always looked fragile to me, you could check if the functions compile using your build systems, and define your own constants such as HAVE_FTELLO64 accordingly.


Note that if you indeed decide to use lseek/_lseeki64 family and numeric file descriptors rather than the FILE* structures, you should be aware of following differences between open/fopen:

  • open doesn't use buffering, fopen does. Less buffering means worse performance.
  • open can't perform newline conversions for text files, fopen can.

    More details in this question.


References: