Verilog: steps to pipelining a simple processor

6k views Asked by At

I asked another question minutes ago but, I'm finishing up a project. Part of the bonus is pipe-lining our processor design. I have a simple accumulator based processor with a data-bus and address bus. It has the three basic stages [fetch, decode, execute] and most of the basic functional units that are in simple processors. Like data memory, instruction register, ALU, MAR, MDR, controller(handles that states and control signals), etc.

I know what pipe-lining is but haven't figured out how to implement it at the functional level. I have searched around but nothing simplifies it for what I need it to do or haven't found any examples.

1

There are 1 answers

0
Morgan On BEST ANSWER

From Instruction Pipeline the classic 5 stages of a risc processor are:

  1. Instruction fetch
  2. Instruction decode and register fetch
  3. Execute
  4. Memory access
  5. Register write back

If everything worked in zero time there would not be any need for the pipeline stages but as you may have seen with combinatorial logic a chnage on the input takes time to ripple through. Add in the requirement to load and save data to memory and it can be seen that dealing with every thing in 1 clock cycle would be very hard.

To simplify it think of 3 stages Load from memory, Execute and store to memory.

3 Instructions (adding memory instructions) processor has registers r1,r2,r3

addr3 = addr1 + addr2
addr6 = addr4 + addr5
addr9 = addr7 + addr8

     Unit Load        Execute        Store
Cycle 1 : r1 = addr1  -              -
          r2 = addr2  

Cycle 2 : r1 = addr4  acc = r1 + r2  -
          r2 = addr5

Cycle 3 : r1 = addr7  acc = r1 + r2  addr3 = acc
          r2 = addr8

Cycle 4 : r1 = 0      acc = r1 + r2  addr6 = acc
          r2 = 0 

Cycle 5 : r1 = 0      acc = 0        addr9 = acc
          r2 = 0

Therefore when reading an instruction from a the program we can see that different parts of it are used at different times, read memory addresses used cycle 1, the type of operation add, subtract multiply would be used in cycle 2 and the store memory address would be used in cycle 3.

The data path has flip-flops inserted to break it up into (pipeline) stages then you need to delay the relevant parts of the decoded instruction word so they hit the function block at the same time as the data it was intended to operate on.