Looking at the OpenCL libraries out there I am trying to get a complete grasp of each one. One library in particular is clBLAS. Their website states that it implements BLAS level 1,2, & 3 methods. That is great but ViennaCL also has BLAS routines, linear algebra solvers, supports OpenCL and CUDA backends, and is header only. It seems to me, at the moment, that there doesn't appear to be a reason to use clBLAS over ViennaCL but I was wondering if anyone had any reasons why one would use clBLAS over ViennaCL?
Although similar, this is meant to be an extension of this previous question comparing VexCL, Thrust, and Boost.Compute.
 
                        
clBlas is implemented by AMD, so one can hope that it would be faster on AMD hardware. That is usually the sole advantage of vendor BLAS implementations. Unfortunately, this seems to not be the case here.
In this talk ViennaCL authors report that due to their autotuning framework they are able to either outperform clBLAS, or show similar performance.