Frequent Cache misses for loading data and accumulating Elements of std vector

54 views Asked by At

Perf reports that a simple app, which loads a binary blob from the disk and accumulates its elements, has ~3E+07 cache misses. The program is shown below:

#include <fstream>
#include <iostream>
#include <numeric>
#include <vector>

std::vector<float> read_blob(const std::string& file_path, int sz) {
    std::vector<float> data(sz);
    std::fstream file(file_path, std::ios::binary | std::ios::in);
    if (!file.is_open()) {
        throw std::runtime_error{"Cannot open buffer from disk"};
    }
    file.read((char*)data.data(), sizeof(float) * sz);
    file.close();
    return data;
}

int main() {
    constexpr size_t sz{311040000};
    const std::string path{"/tmp/blob.bin"};
    std::vector<float> data = read_blob(path, sz);
    float sum =
        std::accumulate(data.begin(), data.end(), 0.0F) / (float)data.size();
    std::cout << "the sum is " << sum << std::endl;
}

The program is compiled with -O3 optimization for my machine -march=native. Digging further, I see that the majority of the misses occur before main and are associated with a brnf_frag_data_storage ([br_netfilter]) . In more detail:

Cache Misses
brnf_frag_data_storage ([br_netfilter]) 2.058E+07 (60%)
Main 1.325E+07 (40%)
Total 3.838E+07

My questions are:

  • What is happening before main is called?
  • Can I somehow ask perf to consider only events starting with main?
  • Given the length of the vector, I seem to have a cache miss for every ~31 elements. How can they be so frequent?
0

There are 0 answers