how to calculate uncertainty about named entities?

13 views Asked by At

I'm trying to use the most uncertain sample method on my named entity dataset which is a list made up of phrases which are themselves made up of tokens. To do this, I'm using modal which has already implemented this function but I've identified it so that it works for me.

Firstly, using my CRF() model, I return a matrix with several probabilities for each token in each sentence. In fact, each time, I have several probabilities because I have several classes. But what I'm doing is that I want a single probability per token and I want to calculate the uncertainty rate, so I take 1- the maximum probability of the token to obtain the uncertainty rate. Except that afterwards I'd like to obtain a probability for each sentence, so I've added them together.

```  

     def uncertainty_sampling(self,x,n_instances:int=1):
            uncertainty = self.classifier_uncertainty(x)#matrix probas
            print(uncertainty)
            sequence_uncertainties = np.sum(uncertainty, axis=2)
            total_sequence_uncertainties = np.sum(sequence_uncertainties, axis=1)
            print(total_sequence_uncertainties)
            most_uncertain_indices = np.argpartition(-total_sequence_uncertainties, n_instances)[:n_instances]
            print(most_uncertain_indices)
            most_uncertain_sequences = uncertainty[most_uncertain_indices]
            return most_uncertain_indices

```

It all adds up. Except that when I compare my crf() trained on a random sample and my crf() with active learning, my normal ctf() works better... So I think I've made a mistake in my code. Can you help me?

Here's an example of my data: Matrix of probas called uncertainty where each value = 1-np.max(probabilites of each class)

[[[0.35675915]
  [0.3411394 ]
  [0.32315629]
  ...
  [0.        ]
  [0.        ]
  [0.        ]]

 [[0.37441699]
  [0.44393638]
  [0.31663258]
  ...
  [0.        ]
  [0.        ]
  [0.        ]]

 [[0.34106778]
  [0.23466529]
  [0.1312631 ]
  ...
  [0.        ]
  [0.        ]
  [0.        ]]

Total_sequence_uncertainties (sum of probas for each sentence): [3.21548365 9.10346426 3.68245306 ... 2.66100948 1.95914588 1.56346819] Thank you a lot!!!

0

There are 0 answers