I'm looking at this code for the generator network of a DCGAN based on the original DCGAN paper.
z_size = 100
num_g_filters = 64
netG = nn.Sequential(
nn.ConvTranspose2d(
in_channels=z_size,
out_channels=num_g_filters * 8,
kernel_size=4,
stride=1,
padding=0,
bias=False
),
nn.BatchNorm2d(num_features=num_g_filters * 8),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(
in_channels=num_g_filters * 8,
out_channels=num_g_filters * 4,
kernel_size=4,
stride=2,
padding=1,
bias=False
),
nn.BatchNorm2d(num_features=num_g_filters * 4),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(
in_channels=num_g_filters * 4,
out_channels=num_g_filters * 2,
kernel_size=4,
stride=2,
padding=1,
bias=False
),
nn.BatchNorm2d(num_features=num_g_filters * 2),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(
in_channels=num_g_filters * 2,
out_channels=num_g_filters,
kernel_size=4,
stride=2,
padding=1,
bias=False
),
nn.BatchNorm2d(num_features=num_g_filters),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(
in_channels=num_g_filters,
out_channels=num_channels,
kernel_size=4,
stride=2,
padding=1,
bias=False
),
nn.Tanh()
)
Where does the z_size
vector get projected and reshaped from a length of 100 to 4 x 4 x 1024? The first transposed convolution takes in_channels
= 100 and outputs out_channels
= 512 seemingly skipping the projecting and reshaping step. There also seems to be an additional transposed convolution that takes in_channels
= 64 * 2 = 128 and outputs out_channels
= 64 which I don't see in the visual of the network. In the diagram, the last transposed convolution takes in 128 channels and outputs 3 channels.