Suppose I use NSight Systems to profile my program, and create an SQLite 3 database, as follows:
nsys profile -o /path/to/db --export=sqlite /path/to/executable --arg1=val1 --arg2
What exactly do I do now to obtain the execution times of my various kernel invocations?
The CUPTI documentation (for CUDA 11.2) says:
And these are two names of tables in the SQLite3 output DB. Here's how to query them:
It is also educational to run:
and then enter
to get the SQL creation command for all table in the schema. That would typically look like the following (with CUDA 11.2 and nsys 2020.4.3):
And you can apply any SQL query to this (in SQLite's dialect of course).