During the training of my neural network model, I used a Pytorch's data loader to accelerate the training of the model. But instead of using a fixed batch size before updating the model's parameter, I have a list of different batch sizes that I want the data loader to use.
Example
train_dataset = TensorDataset(x_train, y_train) # x_train.shape (8400, 4)
dataloader_train = DataLoader(train_dataset, batch_size=64) # with fixed batch size of 64
What I want is a data loader that can use a list of batch size that is dynamic (not fixe)?
list_batch_size = [30, 60, 110, ..., 231] # with this list's sum being equal to x_train.shape[0] (8400)
You can use a custom sampler (or batch sampler) for this.
Here's a quick proof-of-concept for a sampler that takes custom batch sizes as an argument to return batch indices as such:
You can instantiate the sampler and use it as the
sampler
argument while instantiating theDataLoader
e.g.:Note that, each element in the
data_loader
iterable would contain one extra dimension for the batch (as the default value forbatch_size
is 1 inDataLoader
); you can either useunsqueeze(dim=0)
to get rid of the extra dim. Or better pass the sampler as thebatch_sampler
argument: