Using tensorboard profiler with orbit controller?

22 views Asked by At

I am using official Deeplab repository that uses Orbit for training the models.

I wanted to capture a profile so that I can analyse the performance - the way I did it is to add a step for starting a profiler server as a first step in my train script like this:

tf.profiler.experimental.server.start(6009)

and use Tensorboard 'profile' tab to capture a profile for sufficient time (I looked into 'steps/sec' metric in my logs and used that to calculate the time for profiling to cover enough steps).

The profile gets captured fine but I see this message at the top:

No step marker observed and hence the step time is unknown. This may happen if (1) training steps are not instrumented (e.g., if you are not using Keras) or (2) the profiling duration is shorter than the step time. For (1), you need to add step instrumentation; for (2), you may try to profile longer.

What can I do to correct this behavior?

I am using Tensorflow 2.11 version.

Should be able to look at different metrics for data fetching, model forward and backprop using the profiler.

0

There are 0 answers