memory bandwidth of convolutional layer: numbers calculations

16 views Asked by At

In this article, the author is mentioning that

Convolutions: for convolution operation, the bandwidth requirements are usually lower, as an input map data can be used in several convolution operation in parallel, and convolution weights are relatively small.

For example: a 13 x 13 pixel map in a 3x3 convolution operation from 192 input maps to 192 output maps (as, for example, in Alexnet layer 3) requires: ~4MB weight data and ~0.1MB input data from memory. This may require about 3.2 GB/s to be performed on a 128 G-ops/s system with ~99% efficiency (SnowFlake Spring 2017 version). The bandwidth usage is low is because the same input data is used to compute 192 outputs, albeit with different small weight matrices.

When I tried to re-calculate these numbers, something is missing:

Let's say that one element of conv. layer requires 3x3x192 x2 (for additions and multiplications) output is 11x11 (assuming no padding) therefore for one out_channel we require 418176 and there are 192 out_channels, therefore, MACs needed are 80,289,792. with a system at 128G-GFLOPS, 80,289,792/(128*2^30) = time required is 0.000584

I don't know how he arrived at bandwidth requirement of 3.2GB/s and secondly how did weights data is 4 MB..

because, # of weights are 3x3x192x192x4 (4 for FP32) = 1.26 MB

0

There are 0 answers