very long instruction that consists of operations with different latencies

Question

very long instruction that consists of operations with different latencies

363 views Asked by enzom83 At 11 July 2012 at 15:31

Consider a VLIW processor with an issue width equal to N: this means that it is able to start N operations simultaneously, so each very long instruction can consist of a maximum of N operations.

Suppose that the VLIW processor load a very long instruction which consists of operations with different latencies: operations belonging to the same very long instruction could end at different times. What happens if an operation finishes its execution before other operations belonging to the same very long instruction? Could a subsequent operation (that is an operation belonging to the next very long instruction) start execution before the remaining operations of the current very long instruction being executed? Or does a very long instruction wait for the completion of all operations belonging to the current very long instruction?

Original Q&A

There are 2 answers

mjacobs On 16 July 2012 at 19:45

Most VLIW processors I've seen do support operations with different latencies.

It's up to the compiler to schedules these instructions, and to ensure that the operands are available before the operation executes. A VLIW processor is dumb, and doesn't check any dependencies between operations. When a long instruction word executes, each operation in the word simply reads its input data from a register file, and writes its result back at the end of the same cycle, or later if an operation takes two or three cycles.

This only works when instructions are deterministic, and always take the same number of cycles. All VLIW architectures I've seen have operations that take a fixed number of cycles, no less, no more. In case they do take longer, like for instance an external memory fetch, the whole machine is simply stalled.

Now there is one key thing that limits the scheduling of instructions that have different latencies: the number of ports to the register file. The ports are the connections between the register file and execution units of the operations. In a VLIW processor, each operation executes in an issue slot, and each issue slot has its own ports to the register file. Ports are expensive in terms of hardware. The more ports, the more silicon is required to implement the register file.

Now consider the following situation where a two-cycle operation wants to write its result to the register file at the same time as a single-cycle operation that was scheduled right after it. There's now a conflict, as both operations want to write to the same register file over the same port. Again, it's the compiler's task to ensure this doesn't happen. In many VLIW architectures, the operands that execute in the same issue slot all have the same latency. This avoids this conflict.

Now to answer your questions:

You said: "What happens if an operation finishes its execution before other operations belonging to the same very long instruction?"

Nothing special happens. The processor just continues to execute the next very long instruction word.

You said: "Could a subsequent operation (that is an operation belonging to the next very long instruction) start execution before the remaining operations of the current very long instruction being executed?"

Yes, but this could present a register port conflict later on. It's up to the compiler to prevent this situation.

You said: "Or does a very long instruction wait for the completion of all operations belonging to the current very long instruction?"

No. The processor at every cycle simply goes to the next very long instruction word. There's an exception and that is when an operation takes longer than normal, for instance because there's a cache miss, and then the pipeline is stalled, and the machine does not progress the next long instruction word.

**Ira Baxter** · Accepted Answer · 2012-07-11T16:02:41+00:00

The idea behind VLIW is that the compiler figures out lots of things for the processer to do in parallel and packages them up in bundles called "Very long instruction words".

Amhdahl's law tells us the the speedup of a parallel program (eg., the parallel parts of the VLIW instruction) is constrained by the slowest part (e.g, the longest-duration subinstruction).

The simple answer with VLIW and "long latencies" is "don't mix sub-instructions with different latencies". The practical answer is the VLIW machines try not to have sub-instructions with different latencies; rather ideally you want "one clock" subinstructions. Typically even memory fetches take only one clock by virtue of being divided into "memory fetch start (here's an address to fetch)" with the only variable latency subinstruction being "wait for previous fetch to arrive" with the idea being that the compiler generates as much other computation as it can so that the memory fetch latency is comvered by the other instructions.

TechQA.

very long instruction that consists of operations with different latencies

There are 2 answers

Related Questions in PARALLEL-PROCESSING

Related Questions in CPU-ARCHITECTURE

Related Questions in VLIW

Popular Questions

Popular Tags

Trending Questions