hw/os: linux 4.9, 64G RAM.
16 daemons running. Each reading random short (100 bytes) pieces of 5GiB file accessing it as a memory mapped via mmap() at daemon startup. Each daemon reads its own file, so 16 5GiB files total.
Each daemon making maybe 10 reads per second. Not too much, disk load is rather small.
Sometimes (1 event in 5 minutes, no period, totally random) some random daemon stuck in kernel code with the followind stack (see picture) for 300 milliseconds. This does not corellate with major-faults: the major-faults go at constant rate about 100...200 per second. Disk reads are also constant.
What can cause this?
Text of the image: __list_del_entry isolate_lru_pages.isra.48 shrink_inactive_list shrink_node_memcg shrink_node node_reclaim get_page_from_freelist enqueue_task_fair sched_clock __alloc_pages_nodemask alloc_pages_vma handle_mm_fault __do_page_fault page_fault
You have shrink_node and node_reclaim functions in your stack. They are called to free memory (which is shown as buff/cache by
free
command-line tool): https://www.kernel.org/doc/html/latest/admin-guide/mm/concepts.html#reclaimSo your 64 GB RAM system has situation when there is no free memory left. This amount of memory is enough to hold a copy of 12 files of 5 GB each, and your daemons uses 16 files. Linux may read more data from files than it was required by application with Readahead technique ("Linux readahead: less tricks for more", ols 2007 pp273-284, man 2 readahead). MADV_SEQUENTIAL may also turn on this mechanism, https://man7.org/linux/man-pages/man2/madvise.2.html
Not sure how your daemons did open and read files, was MADV_SEQUENTIAL active for them or not (or was this flag added by glibc or any other library). Also there can be some effect from THP - Transpartent huge pages https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html. Normal 4.9 kernel is from 2016 and thp expansion for filesystems was planned in 2019 https://lwn.net/Articles/789159/, but if you use RHEL/CentOS, some features may be backported into fork of 4.9 kernel.
You should check
free
andcat /proc/meminfo
output periodically to check how your daemons and linux kernel readahead uses memory.