it's clearly that float16 can save bandwidth, but is float16 can save compute cycles while computing transcendental functions, like exp()?
Can float16 data type save compute cycles while computing transcendental functions?
348 views Asked by Leonardo Physh At
1
There are 1 answers
Related Questions in CPU-ARCHITECTURE
- Is it possible to use ES5 JavaScript with Angular 2 instead of TypeScript?
- Module '"angular2/angular2"' has no exported member 'For'
- import syntax in typescript creating another js file in visual studio
- Separate ts file for imports
- How to use an AngularJS 2 component multiple times in the same page?
- injectables not working in angular 2.0 latest build 26
- Does angular2 bootstrap have a way to dynamically target elements like it does in angular 1.x
- Import {} from location is not found in VS Code using TypeScript and Angular 2
- Angular 2/Typescript: require not found
- ng-switch in Angular2
Related Questions in HPC
- Is it possible to use ES5 JavaScript with Angular 2 instead of TypeScript?
- Module '"angular2/angular2"' has no exported member 'For'
- import syntax in typescript creating another js file in visual studio
- Separate ts file for imports
- How to use an AngularJS 2 component multiple times in the same page?
- injectables not working in angular 2.0 latest build 26
- Does angular2 bootstrap have a way to dynamically target elements like it does in angular 1.x
- Import {} from location is not found in VS Code using TypeScript and Angular 2
- Angular 2/Typescript: require not found
- ng-switch in Angular2
Related Questions in HALF-PRECISION-FLOAT
- Is it possible to use ES5 JavaScript with Angular 2 instead of TypeScript?
- Module '"angular2/angular2"' has no exported member 'For'
- import syntax in typescript creating another js file in visual studio
- Separate ts file for imports
- How to use an AngularJS 2 component multiple times in the same page?
- injectables not working in angular 2.0 latest build 26
- Does angular2 bootstrap have a way to dynamically target elements like it does in angular 1.x
- Import {} from location is not found in VS Code using TypeScript and Angular 2
- Angular 2/Typescript: require not found
- ng-switch in Angular2
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
If your hardware has full support for it, not just conversion to float32, then yes, definitely. e.g. on a GPU, or on Intel Alder Lake with AVX-512 enabled, or Sapphire Rapids. Half-precision floating-point arithmetic on Intel chips. Or apparently on Apple M2 CPUs.
If you can do two 64-byte SIMD vectors of FMAs per clock on a core, you go twice as fast if that's 32 half-precision FMAs per vector instead of 16 single-precision FMAs.
Speed vs. precision tradeoff: only enough for FP16 is needed
Without hardware ALU support for FP16, only by not requiring as much precision because you know you're eventually going to round to fp16. So you'd use polynomial approximations of lower degree, thus fewer FMA operations, even though you're computing with float32.
BTW,
exp
andlog
are interesting for floating point because the format itself is build around an exponential representation. So you can do an exponential by converting fp->int and stuffing that integer into the exponent field of an FP bit pattern. Then with the the fractional part of your FP number, you use a polynomial approximation to get the mantissa of the exponent. Alog
implementation is the reverse: extract the exponent field and use a polynomial approximation of log of the mantissa, over a range like 1.0 to 2.0.See
Efficient implementation of log2(__m256d) in AVX2
Fastest Implementation of Exponential Function Using AVX
Very fast approximate Logarithm (natural log) function in C++?
vgetmantps vs andpd instructions for getting the mantissa of float
Normally you do want some FP operations, so I don't think it would be worth trying to use only 16-bit integer operations to avoid unpacking to float32 even for exp or log, which are somewhat special and intimately connected with floating point's
significand * 2^exponent
format, unlike sin/cos/tan or other transcendental functions.So I think your best bet would normally still be to start by converting fp16 to fp32, if you don't have instructions like AVX-512 FP16 can do actual FP math on it. But you can gain performance from not needing as much precision, since implementing these functions normally involves a speed vs. precision tradeoff.