why do pipeline constraints of Coarse-grained multithreading and Fine-grained multithreading differ?

97 views Asked by At

In "Computer Organization and Design: The Hardware/ Software Interface, Sixth Edition" RISCV Edition by David A. Patterson and John L. Hennessy chapter 6.4, it says about "coarse-grained multithreading":

This change relieves the need to have thread switching be extremely fast and is much less likely to slow down the execution of an individual thread, since instructions from other threads will only be issued when a thread encounters a costly stall.

Because a processor with coarse-grained multithreading issues instructions from a single thread, when a stall occurs, the pipeline must be emptied or frozen. The new thread that begins executing after the stall must fill the pipeline before instructions are able to complete.

But about "Fine-grained multithreading", it doesn't refer to changes to pipeline when switching threads:

This interleaving is often done in a round-robin fashion, skipping any threads that are stalled at that clock cycle.

Q: Since the book says:

A thread includes the program counter, the register state, and the stack.

and both categories of multithreading begins switching threads when encountering stalls, why must Coarse-grained multithreading need pipeline be empty because pipeline instruction source is only from a single thread and then fill the pipeline but "Fine-grained multithreading" not?

2

There are 2 answers

4
Peter Cordes On BEST ANSWER

I think the point is that if you're going to have two sets of register state, page tables, FP exception state, etc. that can be active at once, you might as well do fine-grained multithreading.

So it wouldn't be a good tradeoff to make a coarse-grained multithreading CPU that paid most of the cost to support fine-grained multithreading. In this paragraph at least, that looks like an unstated assumption, but perhaps they discuss it elsewhere.

The benefit of only doing coarse-grained multithreading this way is that you don't need to support having instructions from different contexts in the pipeline at once, simplifying things such as FP exceptions and rounding mode to not need to be per-instruction.

Architectural state for the thread being swapped out can get saved to special storage that's only accessed by the hardware-context-switching logic, instead of extra tag bits in a bunch of things, and a RAT with twice as many entries.

(As Dr. Bandwidth comments, fine-grained multithreading is usually only used in CPUs with out-of-order exec and register renaming.)

0
sarang On

In fine grained multithreading, the instructions are inserted into pipeline in a round robin fashion from each thread assigned to that particular core and if a thread is experiencing a memory stall, then it's instruction is not inserted into the pipeline during that moment. whereas in coarse grained multithreading, the pipeline contains the instructions of the currently executing thread only. so to switch between the threads, the pipeline must be emptied to get filled with the instruction of the other thread.

this is not the case with fine grained multithreading because it is continuously switching between the threads and thus the pipeline contains instructions of all the threads that are not stalled.

i hope this answers your question.