I have a file parser, which loads the file as a file mapping object and then maps region based on the caller's demand. The files can be accessed locally or from over the network.
This file reading is sequential.
I have had 2 approaches -
Approach 1
Create a file_mapping object, say f.
Create mapped regions using 'f', where at a time, only one mapped_region is active.
Approach 2
Create new file_mapping object(for the same file), every time a mapped_region is to be created.
Approach 2 based on the assumption that file_mapping are designed for Inter Process Communication, so creating multiple file_mapping objects for the same file will not be an issue.
When the file was accessed locally, the performance benchmarks, shown, similar timings in both the approaches(for a 3GB File)
However, when the files were being accessed from over the network, Approach 2, performed ~5 times slower as compared to Approach 1.
In both the approaches, the mapped_region is getting destructed, before a new one gets created.
The only difference in Approach 2 is that file_mapping object gets created and destructed for each mapped_region.
Bench marking Procedure for sequential file read operation:
-File Size : 3GB
-Number of Mapped Regions Created : 8600
-Mapped Region Size : Variable, varies in the range of 9KB - 900KB
-Mapped Region Access : All the regions are accessed immediately after their creation.
-Mapped Region Creation Frequency : One after the other in loop, with some processing of the data extracted from mapped region. The processing involves recursive structure population, though depth is max 10. It is not very intensive.
I want to understand:
Why is there such a huge difference in timings?
What exactly happens when a file_mapping object is created?
What exactly happens when a mapped_region objected is created?
Does the OS searches for a big chunk of memory during file_mapping creation or during mapped_region creation?
When is the file data actually loaded in main memory?
Many Thanks.
It's impossible to answer this question without knowing your benchmark procedure, i.e. how often you created the
mapped_region
objects, how big were the mapped portions, and which fractions of them were actually accessed.It's also unclear if in your implementation the file mapping object (
boost::file_mapping
) also opens the file handle explicitly, or the file is opened by you.I can only guess that creating the file and mapping object in network file system probably requires more inter-PC communication (for synchronization).