When fseek() is called in C - or, seek() is called on a file object in any modern language like Python or Go - what happens at a very low level?
What does the operating system or hard drive actually do? What gets read? What overhead is incurred? How does block size affect this overhead?
Edit to add:
Given NTFS with a block size of 4KB, does seeking 4096 bytes incur less IO overhead than reading 4096 bytes?
Second Edit:
When in doubt, go empirical.
Using some naive Python code with a 1.5GB file:
Reading 4096 sequentially: 21.2
Seek 4096 (relative): 1.35
Seek 4096 (absolute): 0.75 (interesting)
Seek and read every third 4096 (relative): 21.3
Seek and read every third 4096 (absolute): 21.5
The times are averaged are in seconds. The hardware is a nondescript PC with a SATA drive running Windows XP.
This was hugely disappointing. I have several GB of files that I have to read on a near continual basis. About 66% of the 4KB blocks in the files are uninteresting and I know their offset in advance.
Initially, I thought it might be a Big Win to rewrite the legacy code involved as it now does a sequential read 4096 bytes at a time through the files. Assuming Win32 Python is not broken in some fundamental way, incorporating seek has no advantage for non-random reads.
This heavily depends on current conditions. Generally, fseek() only changes state of the stream (either sets current position, or returns an error if parameters are wrong). But - fseek() flushes buffer, that might incur pending write operation. If file is UTF8 file and translation is enabled, ftell() called from fseek() needs to read that part of the file to correctly calculate the offset. If CRLF translation is enabled, it also incurs read operations. But in case of plain binary file and no pending write operation, fseek() just sets position within the stream and doesn't need to go to lower level. For more details, see source code of CRT.