I'm exploring the workings of LIME in NLP models to understand how it elucidates positive and negative words.
I possess a trove of documents resembling this excerpt:
The UN children’s agency says the 'world cannot stand by and watch' the suffering in Gaza.
'Intensifying conflict, malnutrition, and disease in Gaza are creating a deadly cycle that is threatening over 1.1 million children,' UNICEF said in a social media post.
At least 249 Palestinians have been killed and 510 wounded in the previous 24 hours in Gaza, the health ministry says."
My classification model distinguishes each document as either "United Nation Related" or "NON-UN Related" denoted by labels 1 (United Nation Related) or 0 (NON-UN).
Here's a snippet of the code implementation:
exp = explainer.explain_instance(X_test.values[i], clf.predict_proba, num_features=LIMEMaxFeatures)
lst = exp.as_list()
Initially, everything seems fine. However, an issue arises when employing LIME for certain documents classified as NON-UN Related.
Consider this example:
"Guterres invoked this responsibility, saying he believed the situation in Israel and the occupied Palestinian territories, 'may aggravate existing threats to the maintenance of international peace and security'."
Despite this text, my model classifies it as NON-UN Related. Upon using LIME to delve into the reasons behind this classification, the results are as follows:
Word Value
occupied -0.130118107160623
situation -0.284915997715762
Guterres 0.22668070156952
Gaza 0.144198872750898
My query pertains to the word "Guterres," where the LIME value is POSITIVE. Does this imply that "Guterres" supports the decision of label 0 prediction (NON-UN Related)? In essence, does a higher value for "Guterres" signify the model's stronger confidence in labeling the document as NON-UN Related?
Alternatively, does the POSITIVE value for "Guterres" signify a tendency towards label 1 (United Nation Related)? Does a higher positive value for "Guterres" indicate a stronger inclination towards labeling the document as United Nation Related?
I found the answer
Based on the documentation of LIME
https://lime-ml.readthedocs.io/en/latest/lime.html
as_listfunction takes a parameter for the label, by default it targets Label = 1so the Positive values mean they are evidence for the Label = 1 and negative means evidence for Label NOT = 1