I'm working on feature generation before I train a model in PyTorch. I wish to save my features as PyTorch tensors on disk for later use in training.
One of my features ("Feature A") is calculated on a CPU while another feature ("Feature B") must be calculated from that CPU on a GPU (some linear algebra stuff). I have an unusual limitation: on my university cluster, jobs which don't use GPUs have CPU memory limits of 1TB each while jobs which do use GPUs have CPU memory limits of 4GB with GPU memory limits of 48GB. Feature A and Feature B are each approximately 10GB.
Naturally, I want to first calculate Feature A using CPUs only then save Feature A to disk. In another job (this one with GPU access and thus the 4GB CPU memory limitation), I want to load Feature A directly to GPU, compute Feature B, then save Feature B to disk.
With Feature A computed and saved to disk, I've tried:
feaB = torch.load(feaAfile, map_location=torch.device('cuda'))
And yet I max-out my CPU memory. I've confirmed cuda is available.
In the PyTorch documentation I see that in loading tensors they "are first deserialized on the CPU..."
I wonder if there is any way to avoid a CPU memory implication when I want to load only onto the GPU? If the tensor must first be copied to the CPU, could I use some sort of 4GB buffer? Thanks so much in advance.
EDIT: per discussion in the comments, I no longer need to do this. But the question itself, of loading a tensor to the GPU without using CPU memory, remains unanswered so I'm leaving this question up.