I'm encountering a discrepancy between the memory utilization reported by nvidia-smi and what PyTorch is reporting when using GPU memory. For example, the following code snippet will print 2MB:
tensor = torch.randn(256, 256, device='cuda', dtype=torch.float16)
snapshot = torch.cuda.memory._snapshot()
total_size = snapshot['segments'][0]["total_size"] / 1024 / 1024
print(total_size,"MiB")
Which is expected, since according to torch documentation,the smallest memory chunk that torch requests is 2MiB
However, when checking nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:41:00.0 Off | Off |
| 0% 34C P8 14W / 450W | 391MiB / 24564MiB | 0% Default |
| | | N/A |
It appears that 391 MiB have been already used. Where is this mismatch coming from?