I would like to know the throughout, latency, and the number of banks in Kepler's L1 cache (read only 'texture' and normal cache).
in a CUDA program, I'm reading the same data multiple times by different threads, I need to know if i'm bound by the L1 throughput, I couldn't find this information in any of Nvidia's documents, any help would be appreciated.
Edit: I'm using the K20 card.
I myself don't know the number of banks in Kepler. But I think you don't need to care about L1 cache. As below,
L1 caching in Kepler GPUs is reserved only for local memory accesses, such as register spills and stack data. Global loads are cached in L2 only (or in the Read-Only Data Cache)
http://docs.nvidia.com/cuda/kepler-tuning-guide/