I'm testing counter and addition performances on ICE40 and Gatemate FPGAs.
I wrote counter in two differents way :
NaturalCounter
using the operator '+' of chisel (view source):
// Natural counter with classic addition"
class NaturalCount(val COUNT_WIDTH: Int = 32) extends Module {
val io = IO(new Bundle {
val count = Output(UInt(COUNT_WIDTH.W))
})
val MAXCOUNT = BigInt(1) << COUNT_WIDTH
val counterSize = log2Ceil(MAXCOUNT)
val counterValue = RegInit(0.U(counterSize.W))
counterValue := counterValue + 1.U
io.count := counterValue
}
FullAdderCount
that is an instantiation of chained FullAdder described in Chisel here.
/* FullAdder counter */
class FullAdderCount(val COUNT_WIDTH: Int = 32) extends Module {
val io = IO(new Bundle {
val count = Output(UInt(COUNT_WIDTH.W))
})
val counterValue = RegInit(0.U(COUNT_WIDTH.W))
val addition = Module(new FullAdderAddition(COUNT_WIDTH))
addition.io.a := counterValue
addition.io.b := 1.U
counterValue := addition.io.s
io.count := counterValue
}
If I synthesis (and place&route) these counters in a blinker with ice40 using icestorm tools (yosys and nextpnr) with a 44 bits counters I got these performances
NaturalCount
:
2.48. Printing statistics.
=== IcestickBlink ===
Number of wires: 14
Number of wire bits: 240
Number of public wires: 14
Number of public wire bits: 240
Number of memories: 0
Number of memory bits: 0
Number of processes: 0
Number of cells: 130
SB_CARRY 42
SB_DFF 44
SB_LUT4 44
Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 117.32 MHz (PASS at 12.00 MHz)
Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 121.15 MHz (PASS at 12.00 MHz)
FullAdderCount
:
Number of wires: 581
Number of wire bits: 810
Number of public wires: 581
Number of public wire bits: 810
Number of memories: 0
Number of memory bits: 0
Number of processes: 0
Number of cells: 173
SB_DFF 44
SB_DFFE 43
SB_LUT4 86
Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 436.68 MHz (PASS at 12.00 MHz)
Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 380.37 MHz (PASS at 12.00 MHz)
NaturalCount
version use less LUT than FullAdderCount
but clock perfomances are far better with FullAdderCount
.
Is it normal ? What is the purpose of SB_CARRY if the performance are slower than "normal" LUT ?
I tried the same with Gatemate FPGA that is use same software for synthesis (yosys) but another "home made" for place&route (p_r).
NaturalCount
with 44 bits counter on Gatemate :
2.49. Printing statistics.
=== GatemateBlink ===
Number of wires: 72
Number of wire bits: 352
Number of public wires: 24
Number of public wire bits: 217
Number of memories: 0
Number of memory bits: 0
Number of processes: 0
Number of cells: 144
CC_ADDF 44
CC_BUFG 1
CC_DFF 44
CC_IBUF 2
CC_LUT2 44
CC_OBUF 8
CC_PLL 1
Maximum Clock Frequency on CLK 160 (160/3): 189.79 MHz
FullAdderCount
with Gatemate :
2.49. Printing statistics.
=== GatemateBlink ===
Number of wires: 595
Number of wire bits: 835
Number of public wires: 506
Number of public wire bits: 745
Number of memories: 0
Number of memory bits: 0
Number of processes: 0
Number of cells: 186
CC_BUFG 1
CC_DFF 87
CC_IBUF 2
CC_LUT2 2
CC_LUT3 85
CC_OBUF 8
CC_PLL 1
Maximum Clock Frequency on CLK 160 (160/3): 189.79 MHz
With this FPGA, the clock performances are exactly the same.
I wonder why clock performance isn't better when using the SB_CARRY and CC_ADDF cells with '+' instantiation.
Is it a "bug" in my design or is it normal ?
Ok, in fact it was a bug in my code. The FullAdder option was a pdchain fast counter as described on opencore.
I fixed the code and get worse timing performances with FullAdder as expected initially.
Sorry for the inconvenience, but asking this question allowed me to find the mistake.