I was learning about cache line, and the effect of loop stride on the cache. I came across this page which shows the execution time of a loop vs the loop stride. According to the benchmark, increasing the loop stride decreases the execution time which is very confusing to me. As I understand if the cache line is 64 bytes, and lets assume if in the first case the loop stride is just 1 which means the loop goes over the array element sequentially then that should have the least execution time because 16 integers (4byte x 16 = 64bytes) are loaded into the cache. The execution time should be lowest up to a stride of 16 because all 16 elements are loaded into the same cache line. When the stride is increased above 16 that should increase the execution time because the array element won't be in the cache line, but the graph on the page is completely opposite.
Related Questions in ARRAYS
- How could you print a specific String from an array with the values of an array from a double array on the same line, using iteration to print all?
- What does: "char *argv[]" mean?
- How to populate two dimensional array
- User input sanitization program, which takes a specific amount of arguments and passes the execution to a bash script
- Function is returning undefined but should be returning a matched object from array in JavaScript
- The rules of Conway's Game of Life aren't working in my Javascript version. What am I doing wrong?
- Array related question, cant find the pattern
- Setting the counter (j) for (inner for loop)
- I want to flip an image (with three channels RGB) horizontally just using array slicing. How can I do it with python?
- Numpy array methods are faster than numpy functions?
- How to enter data in mongodb array at specific position such that if there is only 2 data in array and I want to insert at 5, then rest data is null
- How to return array to ArrayPool when it was rented by inner function?
- best way to remove a word from an array in a react app
- Vue display output of two dimensional array
- Undot Array with Wildcards in Laravel
Related Questions in CACHING
- Using Puppeteer to scrape a public API only when the data changes
- Caching private wordpress rest endpoints
- Cloudflare not respecting Cache-Control
- Unexpected Recursive Call
- Cannot serialize (Spring Boot)
- Nginx only caches file endpoints
- The Selenium application properties folder holds two environment options. After running a test the environment setting changes to a previous setting
- Launch jobs in cache in a loop in bash script
- Multiple async request do not store anything to cache
- Dev tool for Next.js cache on the client?
- Creating a letter in the terminal by entering
- Laravel: check if cache has key with thag
- The retrieval time for the Apache Ignite cache is too long
- How to run gradle with caches files
- Docker Run cache mount does not cache apt-get dependencies
Related Questions in CPU-ARCHITECTURE
- What is causing the store latency in this program?
- what's the difference between "nn layout" and "nt layout"
- Will a processor with such a defect work?
- How do i find number of Cycles of a processor?
- Why does LLVM-MCA measure an execution stall?
- Can out-of-order execution of CPU affect the order of new operator in C++?
- running SPEC in gem5 using the SimPoint methodology
- Why don't x86-64 (or other architectures) implement division by 10?
- warn: MOVNTDQ: Ignoring non-temporal hint, modeling as cacheable!, While simulating x86 with spec2006 benchamrks I am getting stuck in warn message
- arithmetic intensity of zgemv versus dgemv/sgemv?
- What is the microcode scoreboard?
- Why don't x86/ARM CPU just stop speculation for indirect branches when hardware prediction is not available?
- Question about the behaviour of registers
- How to increase throughput of random memory reads/writes on multi-GB buffers?
- RISVC Single Cycle Processor Data Path and Testbench
Related Questions in CPU-CACHE
- How CPUs Use the LOCK Prefix to Implement Cache Locking and ensure memory consistency
- How to check whether the PCIe Memory-mapped BAR region is cacheable or uncacheable
- Are RISC-V SH and SB instructions allowed to communicate with the cache?
- for remote socket cache-to-cache data transfer, why data homed in reader socket shows higher latency than data homed in writer socket?
- Performance implications of aliasing in VIPT cache
- Why do fast memory writes when run over multiple threads take much more time vs when they are run on a single thread?
- question regarding the behavior of the program in Meltdown attack
- Seeking Verification: MIPS Cache Set Update Analysis
- OS cache/memory hierarchy: How does writing to a new file work?
- Can there be a cache block with the same Tag-ID in different Sets?
- is it a way to do a "store" operation without fetching in case of cache miss
- why is there a need to stop prefetching to pages when a write happens to it?
- is it possible that a cpu has several L3 level caches?
- Are 64-byte CPU cache line reads aligned on 64-byte boundaries?
- how cpu cache when physical address is not contiguous
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)

In that example the Length is constant, so the larger the stride - the less elements you go through.
The interesting phenomena is that it doesn't apply below a cache line, and that's because you can't bring parts of a line. So below 16, you pay the same penalty of fetching all cache lines. Above 16, you start skipping some lines. above 32 for example (128B) you fetch every other line - hence +/- half the time (assuming your execution time is dominated by memory latency)