I've been reading the book "Computer Networking: A Top Down Approach" and encountered a question I don't seem to understand.
As I read, TCP Congestion Control has three states: Slow Start, Congestion Avoidance and Fast Recovery. I understand Slow Start and Congestion Avoidance well, but Fast Recovery is pretty vague. The book claims that TCP behaves this way:(cwnd= Congestion Window)
Let's look at the following graph:
As we can see, at round 16 the sender sends 42 segments, and because the congestion window size had been halved (+3), We can infer that there have been 3 Duplicate-ACKs. The answer to this question claims that rounds between 16 and 22 are in Congestion Avoidance state. But why not Fast recovery? I mean, after three duplicate ACKs TCP enters Fast recovery and every other duplicate ACK since should increase the congestion window. Why the graph has no representation of that? The only reasonable explanation I could think of is that in this graph, there was only three-duplicate ACKs, and the ACKs that had been received since were not duplications.
Even if that's the case, how the graph would have looked like if there had been more than 3 duplicated ACKs? **
Is there any representation of Fast Recovery in the graph above? Why not/yes?
** I've been struggling to answer that question for a long time. I'll be very glad for any reply, Thank you!
UPDATE here's the image. I think a round is defined as when all segments in the window are being ACKed. In the photo, a round is shown with a circle.
Why does cwnd grow exponentially when in Fast Recovery state? (in the image I accidentally wrote expediently instead of exponentially)
UPDATE: My original answer agreed with the solution, but after careful thought, I think the solution is wrong. This answer was rewritten from scratch; please read it carefully. I show why Fast recovery is entered at time T=16 and why the protocol remains there until T=22. The data in the graph backs my theory, so I'm pretty much positive that the solution is plain wrong.
Let's start by setting something straight: Slow Start grows exponentially; congestion avoidance grows linearly, and fast recovery grows linearly even though it uses the same formula as slow start to update the value of
cwnd
.Allow me to clarify.
Why do we say that Slow Start grows
cwnd
exponentially?Note that
cwnd
is increased byMSS
bytes for each ACK received.Let's see an example. Suppose
cwnd
is initialized to 1 MSS (the value of MSS is typically 1460 bytes, so in practice this meanscwnd
is initialized to 1460). At this point, because the congestion window size can only hold 1 packet, TCP will not send new data until this packet is acknowledged. Assuming that ACKs aren't being lost, this implies that approximately one new packet is transferred every RTT seconds (recall that RTT is the round-trip time), since we need (1/2)*RTT to send the packet, and (1/2)*RTT for the ACK to arrive.So, this results in a send rate of roughly MSS/RTT bps. Now, remember that for each
ACK
,cwnd
is incremented byMSS
. Thus, once the firstACK
arrives,cwnd
becomes2*MSS
, so now we can send 2 packets. When these two packets are acknowledged, we incrementcwnd
twice, so nowcwnd
is4*MSS
. Great! We can send 4 packets. These 4 packets are acknowledged, so we get to incrementcwnd
4 times! So we havecwnd = 8*MSS
. And then we getcwnd = 16*MSS
. We are essentially doublingcwnd
every RTT seconds (this also explains whycwnd = cwnd+MSS*(MSS/cwnd)
in Congestion Avoidance leads to linear growing)Yes, it's tricky, the formula
cwnd = cwnd+MSS
easily leads us to believe it's linear - a common misconception, because people often forget this is applied for each acknowledged packet.Note that in the real world, transmitting 4 packets doesn't necessarily generate 4 ACKs. It may generate 1
ACK
only, but since TCP uses cumulative ACKs, that singleACK
is still acknowledging 4 packets.Why is Fast Recovery linear?
The
cwnd = cwnd+MSS
formula is applied in both Slow start and Congestion avoidance. One would think that this causes both states to induce exponential growth. However, Fast recovery applies that formula in a different context: when a duplicate ACK is received. Herein lies the difference: in slow start, one RTT acknowledged a whole bunch of segments, and each acknowledged segment contributed with +1MSS to the new value ofcwnd
, whereas in fast recovery, a duplicate ACK is wasting an RTT to acknowledge the loss of a single segment, so instead of updatingcwnd
N times each RTT seconds (where N is the number of segments transmitted), we are updatingcwnd
once for a segment that was LOST. So we "wasted" one round trip with just one segment, so we only incrementcwnd
by 1.About congestion avoidance - this one I'll explain below when analysing the graph.
Analysing the graph
Ok, so let's see exactly what happens in that graph, round by round. Your picture is correct up to some degree. Let me clear some things first:
cwnd
grow exponentially from one circle to the next - 1, 2, 4, 8, 16, ...cwnd
being halved. That's not what the graph shows: the value ofcwnd
does not decrease to half from T=6 to T=7.Ok, so now let's see exactly what happens on each round. Note that the time unit in the graph is a round. So, if at time T=X we transmit N segments, then it is assumed that at time T=X+1 these N segments have been ACKed (assuming they weren't lost, of course).
Also note how we can tell the value of
ssthresh
just by looking at the graph. At T=6,cwnd
stops growing exponentially and starts growing linearly, and its value does not decrease. The only possible transition from slow start to another state that doesn't involve decreasingcwnd
is the transition to congestion avoidance, which happens when the congestion window size is equal tossthresh
. We can see in the graph that this happens whencwnd
is 32. So, we immediately know thatssthresh
is initialized to 32 MSS. The book shows a very similar graph on page 276 (Figure 3.53), where the authors draw a similar conclusion:Under normal conditions, this is what happens - when TCP switches for the first time from an exponential growth to a linear growth without decreasing the size of the window, it's always because it hit the threshold and switched to congestion avoidance.
Finally, assume that
MSS
is at least 1460 bytes (it is commonly 1460 bytes because Ethernet has MTU = 1500 bytes and we need to account for the size of the TCP + IP headers, which together need 40 bytes). This is important to see whencwnd
exceedsssthresh
, sincecwnd
's unit isMSS
andssthresh
is expressed in bytes.So here we go:
T = 1:
cwnd = 1 MSS; ssthresh = 32 kB
Transmit 1 segment
T = 2
1 segment acknowledged
cwnd += 1; ssthresh = 32 kB
New value of cwnd: 2
Transmit 2 segments
T = 3
2 segments acknowledged
cwnd += 2; ssthresh = 32 kB
New value of cwnd: 4
Transmit 4 segments
T = 4
4 segments acknowledged
cwnd += 4; ssthresh = 32 kB
New value of cwnd: 8
Transmit 8 segments
T = 5
8 segments acknowledged
cwnd += 8; ssthresh = 32 kB
New value of cwnd: 16
Transmit 16 segments
T = 6
16 segments acknowledged
cwnd += 16; ssthresh = 32 kB
New value of cwnd: 32
Transmit 32 segments
Ok, let's see what happens now.
cwnd
reachedssthresh
(32*1460 = 46720 bytes, which is greater than 32000). It's time to switch to congestion avoidance. Note how the values ofcwnd
grow exponentially across rounds, because each acknowledged packet contributes with 1 MSS to the new value ofcwnd
, and every packet sent is acknowledged in the next round.The switch to congestion avoidance
Now,
cwnd
will not increase exponentially, because eachACK
won't contribute with 1 MSS anymore. Instead, eachACK
contributes withMSS*(MSS/cwnd)
. So, for example, ifMSS
is 1460 bytes andcwnd
is 14600 bytes (so at the beginning of each round we are sending 10 segments), then eachACK
(assuming oneACK
per segment) will increasecwnd
by1/10
MSS (146 bytes). Since we send 10 segments, and at the end of the round we assume that every segment was acknowledged, then at the end of the round we have increasedcwnd
by10 * 1/10 = 1
. In other words, each segment contributes a small fraction tocwnd
such that we just incrementcwnd
by 1 MSS each round. So now each round incrementscwnd
by 1 rather than by the number of segments that were transferred / acknowledged.We will remain in congestion avoidance until some loss is detected (either 3 duplicate ACKs or a timeout).
Now, let the clocks resume...
T = 7
32 segments acknowledged
cwnd += 1; ssthresh = 32 kB
New value of cwnd: 33
Transmit 33 segments
Note how
cwnd
went from 32 to 33 even though 32 segments were acknowledged (eachACK
therefore contributes 1/32). If we were in slow start, as in T=6, we would havecwnd += 32
. This new value ofcwnd
is also consistent with what we see in the graph at time T = 7.T = 8
33 segments acknowledged
cwnd += 1; ssthresh = 32 kB
New value of cwnd: 34
Transmit 34 segments
T = 9
34 segments acknowledged
cwnd += 1; ssthresh = 32 kB
New value of cwnd: 35
Transmit 35 segments
Notice that this is consistent with the graph: at T=9, we have
cwnd = 35
. This keeps happening up to T = 16...T = 10
35 segments acknowledged
cwnd += 1; ssthresh = 32 kB
New value of cwnd: 36
Transmit 36 segments
T = 11
36 segments acknowledged
cwnd += 1; ssthresh = 32 kB
New value of cwnd: 37
Transmit 37 segments
T = 12
37 segments acknowledged
cwnd += 1; ssthresh = 32 kB
New value of cwnd: 38
Transmit 38 segments
T = 13
38 segments acknowledged
cwnd += 1; ssthresh = 32 kB
New value of cwnd: 39
Transmit 39 segments
T = 14
39 segments acknowledged
cwnd += 1; ssthresh = 32 kB
New value of cwnd: 40
Transmit 40 segments
T = 15
40 segments acknowledged
cwnd += 1; ssthresh = 32 kB
New value of cwnd: 41
Transmit 41 segments
T = 16
41 segments acknowledged
cwnd += 1; ssthresh = 32 kB
New value of cwnd: 42
Transmit 42 segments
PAUSE
What happens now? The graph shows that the congestion window size decreases to approximately half of its size, and then it grows linearly across rounds again. The only possibility is that there were 3 duplicate ACKs and the protocol switches to Fast recovery. The graph shows that it does NOT switch to slow start because that would bring
cwnd
down to 1. So the only possible transition is to fast recovery.By entering fast recovery, we get
ssthresh = cwnd/2
. Remember thatcwnd
's units isMSS
andssthresh
is in bytes, we have to be careful with that. Thus, the new value isssthresh = cwnd*MSS/2 = 42*1460/2 = 30660
.Again, this lines up with the graph; notice that
ssthresh
will be hit in the near future whencwnd
is slightly less than 30 (recall that with MSS = 1460, the ratio is not exactly 1:1, that's why we hit the threshold even though the congestion window size is slightly below 30).The switch to congestion avoidance also causes the new value of
cwnd
to bessthresh+3MSS = 21+3 = 24
(remember to be careful with units, here I convertedssthresh
into MSS again because our values ofcwnd
are counted in MSS).As of now, we are in congestion avoidance, with T=17,
ssthresh = 30660 bytes
andcwnd = 24
.Upon entering T=18, two things can happen: either we receive a duplicate ACK, or we don't. If we don't (so it's a new ACK), we would transition to congestion avoidance. But this would bring
cwnd
down to the value ofssthresh
, which is 21. That wouldn't match the graph - the graph shows thatcwnd
keeps increasing linearly. Also, it doesn't switch to slow start because that would bringcwnd
down to 1. This implies that fast recovery isn't left and we are getting duplicate ACKs. This happens up to time T=22:T = 18
Duplicate ACK arrived
cwnd += 1; ssthresh = 30660 bytes
New value of cwnd: 25
T = 19
Duplicate ACK arrived
cwnd += 1; ssthresh = 30660 bytes
New value of cwnd: 26
T = 20
Duplicate ACK arrived
cwnd += 1; ssthresh = 30660 bytes
New value of cwnd: 27
T = 21
Duplicate ACK arrived
cwnd += 1; ssthresh = 30660 bytes
New value of cwnd: 28
T = 22
Duplicate ACK arrived
cwnd += 1; ssthresh = 30660 bytes
New value of cwnd: 29
** PAUSE **
We are still in Fast recovery, and now, suddenly
cwnd
goes down to 1. This shows that it enters slow start again. The new value ofssthresh
will be29*1460/2 = 21170
, andcwnd = 1
. It also means that despite our efforts to retransmit the segment, there was a timeout.T = 23
cwnd = 1; ssthresh = 21170 bytes
Transmit 1 segment
T = 24
1 segment acknowledged
cwnd += 1; ssthresh = 21170 bytes
New value of cwnd: 2
Transmit 2 segments
T = 25
2 segments acknowledged
cwnd += 2; ssthresh = 21170 bytes
New value of cwnd: 4
Transmit 4 segments
T = 26
4 segments acknowledged
cwnd += 4; ssthresh = 21170 bytes
New value of cwnd: 8
Transmit 8 segments
...
I hope that makes it clear.