I am trying to compute the Testing Concept Activation Vectors (TCAV, as described here) vectors for different concepts for my classification model. So far, I haven't successfully found code online for Pytorch models so I have decided to rewrite it myself. The code I am trying to copy is:
def compute_tcav(input_tensor, model, AA, layer_name, filter_indices, optimizer, seed_input=None, wrt_tensor=None, backprop_modifier=None, grad_modifier='absolute'):
layer_AA = AA[layer_name]
losses = [
(ActivationMaximization(layer_AA, filter_indices), -1)
]
opt = optimizer(input_tensor, losses, wrt_tensor=wrt_tensor, norm_grads=False)
#grads = opt.minimize(seed_input=seed_input, max_iter=1, grad_modifier=grad_modifier, verbose=False)[1]
return losses #utils.normalize(grads)[0]
source: https://github.com/maragraziani/iMIMIC-RCVs/blob/master/rcv_utils.py
This is what I have so far:
def ActivationMaximizationLoss(input_AA):
loss = torch.mean(input_AA)
return loss
def compute_tcav_pytorch(model, layer_predictions):
optimizer = torch.optim.SGD(model.parameters(), 1e-4)
optimizer.zero_grad()
input_AA = torch.from_numpy(layer_predictions['_blocks.6._project_conv']) # middle
input_AA.requires_grad=True
loss = ActivationMaximizationLoss2(input_AA)
loss.backward()
optimizer.step()
img = input_AA.grad
return img[0][0]
I am trying to maximise a layer activation from the model and from that get the TCAV vector.