In FPGA, why counter with full adder raw implementation have better clock performance than infered addition '+'?

Question

In FPGA, why counter with full adder raw implementation have better clock performance than infered addition '+'?

95 views Asked by FabienM At 07 December 2023 at 21:11

I'm testing counter and addition performances on ICE40 and Gatemate FPGAs.

I wrote counter in two differents way :

NaturalCounter using the operator '+' of chisel (view source):

// Natural counter with classic addition"
class NaturalCount(val COUNT_WIDTH: Int = 32) extends Module {
  val io = IO(new Bundle {
    val count = Output(UInt(COUNT_WIDTH.W))
  })

  val MAXCOUNT = BigInt(1) << COUNT_WIDTH
  val counterSize = log2Ceil(MAXCOUNT)
  val counterValue = RegInit(0.U(counterSize.W))
  counterValue := counterValue + 1.U
  io.count := counterValue 
}

FullAdderCount that is an instantiation of chained FullAdder described in Chisel here.

/* FullAdder counter */
class FullAdderCount(val COUNT_WIDTH: Int = 32) extends Module {
  val io = IO(new Bundle {
    val count = Output(UInt(COUNT_WIDTH.W))
  })

  val counterValue = RegInit(0.U(COUNT_WIDTH.W))
  val addition = Module(new FullAdderAddition(COUNT_WIDTH))
  addition.io.a := counterValue
  addition.io.b := 1.U
  counterValue := addition.io.s

  io.count := counterValue
}

If I synthesis (and place&route) these counters in a blinker with ice40 using icestorm tools (yosys and nextpnr) with a 44 bits counters I got these performances

NaturalCount:

2.48. Printing statistics.

=== IcestickBlink ===

   Number of wires:                 14
   Number of wire bits:            240
   Number of public wires:          14
   Number of public wire bits:     240
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                130
     SB_CARRY                       42
     SB_DFF                         44
     SB_LUT4                        44

Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 117.32 MHz (PASS at 12.00 MHz)
Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 121.15 MHz (PASS at 12.00 MHz)

FullAdderCount:

   Number of wires:                581
   Number of wire bits:            810
   Number of public wires:         581
   Number of public wire bits:     810
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                173
     SB_DFF                         44
     SB_DFFE                        43
     SB_LUT4                        86

Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 436.68 MHz (PASS at 12.00 MHz)
Info: Max frequency for clock 'clk$SB_IO_IN_$glb_clk': 380.37 MHz (PASS at 12.00 MHz)

NaturalCount version use less LUT than FullAdderCount but clock perfomances are far better with FullAdderCount.

Is it normal ? What is the purpose of SB_CARRY if the performance are slower than "normal" LUT ?

I tried the same with Gatemate FPGA that is use same software for synthesis (yosys) but another "home made" for place&route (p_r).

NaturalCount with 44 bits counter on Gatemate :

2.49. Printing statistics.
 
=== GatemateBlink ===
 
   Number of wires:                 72
   Number of wire bits:            352
   Number of public wires:          24
   Number of public wire bits:     217
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                144
     CC_ADDF                        44
     CC_BUFG                         1
     CC_DFF                         44
     CC_IBUF                         2
     CC_LUT2                        44
     CC_OBUF                         8
     CC_PLL                          1

Maximum Clock Frequency on CLK 160 (160/3):  189.79 MHz

FullAdderCount with Gatemate :

2.49. Printing statistics.
 
=== GatemateBlink ===
 
   Number of wires:                595
   Number of wire bits:            835
   Number of public wires:         506
   Number of public wire bits:     745
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                186
     CC_BUFG                         1
     CC_DFF                         87
     CC_IBUF                         2
     CC_LUT2                         2
     CC_LUT3                        85
     CC_OBUF                         8
     CC_PLL                          1

Maximum Clock Frequency on CLK 160 (160/3):  189.79 MHz

With this FPGA, the clock performances are exactly the same.

I wonder why clock performance isn't better when using the SB_CARRY and CC_ADDF cells with '+' instantiation.

Is it a "bug" in my design or is it normal ?

Original Q&A

There are 1 answers

**FabienM** · Accepted Answer · 2024-01-05T21:36:13+00:00

Ok, in fact it was a bug in my code. The FullAdder option was a pdchain fast counter as described on opencore.

    case CounterTypes.FullAdderCount => {
      println("Generate FullAdderCount of " + COUNT_WIDTH + " bits")
      val counter = Module(new PdChain(COUNT_WIDTH)) // <---- wrong class instaciated
      io.leds := counter.io.count(counter.counterSize-1, counter.counterSize-LED_WIDTH)
    }

I fixed the code and get worse timing performances with FullAdder as expected initially.

Sorry for the inconvenience, but asking this question allowed me to find the mistake.

TechQA.

In FPGA, why counter with full adder raw implementation have better clock performance than infered addition '+'?

There are 1 answers

Related Questions in FPGA

Related Questions in CHISEL

Related Questions in YOSYS

Related Questions in ICESTORM

Popular Questions

Popular Tags

Trending Questions