Improve performances of division Vivado HLS

Question

Improve performances of division Vivado HLS

688 views Asked by Mattia Surricchio At 13 October 2020 at 17:35

I'm computing the incremental mean of my input data (which is an array of 6 elements, so i'll end up with 6 means).

This is the code I'm using everytime a new input array is available (obviously I update the number of samples ecc...):

computing_mean:for(int i=0;i<6;i++){
       temp_mean[i]=temp_mean[i] + (input[i]-temp_mean[i])/number_of_samples;
       //Possible optimization?
       //temp_mean[i]=temp_mean[i] + divide(input[i]-temp_mean[i],number_of_samples);

}

Where all the data in the code are arrays or single number of the following type:

typedef ap_fixed <36,24,AP_RND_CONV,AP_SAT> decimalNumber;

From my synthesis report this loop hase 324 latency and 54 iteration latency, caused mainly by the division operation.

Are there any ways I can improve the speed of the division? I tried using hls_math and the divide function, but it doesn't seem to work with my type of data.

EDIT 1: I'm including my performance profiler inside vivado HLS. I'll add a self-contained reproducible code later with another edit. As you can see, the majority of the time is spent in SDIV

Original Q&A

There are 1 answers

**tadman** · Accepted Answer · 2020-10-13T19:29:06+00:00

Other than trigonometric functions like sin() (FSIN = ~50-170 cycles) and cos() (FCOS = ~50-120 cycles), or things like sqrt() (FSQRT = ~22 cycles), division will always be the most painful.

FDIV is 15 cycles. FADD and FMUL are both 5.

There are occasions where you can skip division and do bit-shifting instead, if you're working with integer data and the number you're dividing by is a power of 2, but that's about it.

You can look up the approximate CPU cycle cost of any given instruction in tables like this. FDIV is an example of an expensive one.

That being said, one thing you could try is to compute the division factor in advance, then apply it using multiplication instead:

double inverse_n = 1 / number_of_samples;

temp_mean[i]=temp_mean[i] + (input[i]-temp_mean[i]) * inverse_n;

I'm not sure that's saving a whole lot, but if you really do need to shave off cycles, it's worth a shot.

TechQA.

Improve performances of division Vivado HLS

There are 1 answers

Related Questions in C

Related Questions in OPTIMIZATION

Related Questions in VIVADO

Related Questions in VIVADO-HLS

Popular Questions

Popular Tags

Trending Questions