I have read the full documentation for gnu sort and searched online but I cannot find what the default for the --buffer-size option is (which determines how much system memory the program uses when it runs). I am guessing it is somehow determined based on total system memory? (or perhaps on memory available at the time the program is begins execution?). How can I determine this?
update: I've experimented a bit and it seems that when I don't specify a particular --buffer-size value, it ends up using very little ram and thus going very slowly. It would be nice though to better understand what exactly is determining this behavior.
I went digging through the coreutils sort source code and found these functions:
default_sort_sizeandsort_buffer_size.It turns out that
--buffer-size(sort_sizein the source code) isn't the target buffer size but rather the maximum buffer size. If no--buffer-sizevalue is specified, thedefault_sort_sizefunction is used to determine a safe maximum buffer size. It does this based on resource limits, available memory, and total memory. A summary of the function is as follows:The other function,
sort_buffer_size, is used to determine exactly how much memory to allocate for the given input files. A summary of the function is as follows:Possibly the most important point of the
sort_buffer_sizefunction is that if you're sorting data from STDIN or a pipe, it will automatically default tosort_size(i.e.--buffer-size) if it was provided. Otherwise, for regular files it will make some rough calculations based on the file sizes and only usesort_sizeas an upper limit.