I have the following problem and I only have a slight idea about it:
Consider a tape storage problem. Given n
files of length l1,...,ln
and frequencies with which they are accessed f1,...,fn
, where sum of all frequencies is 1 and 0<fi<1
. "Optimal" means to minimize the average retrieval time of these files. For example, if you have two files, each of length 10, 4
and frequency 0.8, 0.2
, if store file 1 first, the average retrieval time is 10 x 0.8 + 14 x 0.2 = 10.8
.
Design an algorithm to find the optimal order and prove that it works.
My thoughts: Larger frequencies & larger length in front of the order, but which factor should be given a higher priority?
There is no need for Dynamic Programming for this problem. It is a simple sorting problem, with a slight twist.
Find
frequency / length
for each file. This gets you, how frequent is each unit length access for the file. Sort these in descending order since each file is "penalized" by the previous file's length.Proof:
Assume that the order is fixed somehow. Then, swapping any two adjacent files will not affect any of the other
li * fi
other than the location of the swap. Call the left one in the swapx
and other oney == x+1
. Then, the change in average retrieval time isly*fx - lx*fy
. Iffx/lx < fy/ly
, thenly*fx < lx*fy
which meansly*fx - lx*fy < 0
, which means that the average retrieval time decreased. This means that we are at an advantage to do this swap. We keep doing such swaps wheneverfx/lx < fy/ly
and we will get to a point when this is no longer possible. This must be the optimal answer.Swapping adjacent things whenever
left < right
, is similar to bubble sorting in descending order, so we can just generally use any sort instead.