Unable to capture iterations on dlprof

15 views Asked by At

I’m trying to profile inferences of a tiny model with dlprof, but I can’t seem to capture iteration information when i let it run for multiple iterations, this is what the code does

class SmallModel(nn.Module):

    def __init__(self):
        super(SmallModel, self).__init__()
        self.layer1 = nn.Linear(784, 512)
        self.layer2 = nn.Linear(512, 256)

    def forward(self, x):
        x = torch.relu(self.layer1(x))
        x = torch.relu(self.layer2(x))
        return x
model = SmallModel().cuda().half()
input_data = torch.randn(64, 784).cuda().half()

nvidia_dlprof_pytorch_nvtx.init(enable_function_stack=True)

parser = argparse.ArgumentParser("Nvidia Profiler")
parser.add_argument("--num_iter", dest='num_iter', help="no of iterations to perform", type=int)
args = parser.parse_args()
with torch.no_grad():
        with torch.autograd.profiler.emit_nvtx():
            for i in range(args.num_iter):
                _ = model(input_data)

This is the command I'm running

dlprof --mode=pytorch --key_node=LINEAR_1 -f true --reports=summary,detail,iteration --iter_start=5 --iter_stop=8 python profile_sample_model.py --num_iter 10

This is what the dlprof log generates:

Found 2 iterations using key_op “LINEAR_1” Iterations: [12495162999, 12520617892] Aggregating data over 1 iterations: iteration 1 start (12495162999 ns) to iteration 1 end (12520617892 ns)**

i want dlprof to capture from iter 5 to iter 8 independently, instead it skips aggregation until the first instance it encounters the specified key_node and then aggregates the rest of the 9 iterations as a one iteration, what am i doing wrong here, --iter_start=5 --iter_stop=8 doesn’t seem to have any effect

Really appreciate any guidance on this.

0

There are 0 answers