Max Length error while using Huggingface Transformer model for SHAP Explanation

358 views Asked by At

I am using SHAP Explanation to explain the output of the pretrained model. It works for the documents with the token length less than 1024. It throws an error below if I provide sequence with token length more than 1024. The script that I used for generating explanation is as follows.

!pip install transformers[sentencepiece] datasets sacrebleu rouge_score py7zr -q    
import numpy as np

import torch
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

import shap

tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-xsum-12-6")
model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-xsum-12-6").cuda()
dataset = load_dataset("xsum", split="train")
s = dataset["document"][2:3]
explainer = shap.Explainer(model, tokenizer)
shap_values = explainer(s)

It throws the error below.

Token indices sequence length is longer than the specified maximum sequence length for this model (1196 > 1024). Running this sequence through the model will result in indexing errors
You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use and modify the model generation configuration (see https://huggingface.co/docs/transformers/generation_strategies#default-text-generation-configuration )
0

There are 0 answers