I'm training BertForSequenceClassification for a classification task. My dataset consists of 'contains adverse effect' (1) and 'does not contain adverse effect' (0). The dataset contains all of the 1s and then the 0s after (the data isn't shuffled). For training I've shuffled my data and get the logits. From what I've understood, the logits are the probability distributions before softmax. An example logit is [-4.673831, 4.7095485]. Does the first value correspond to the label 1 (contains AE) because it appears first in the dataset, or label 0. Any help would be appreciated thanks.
How does the BERT model select the label ordering?
1k views Asked by abhishekkuber At
1
There are 1 answers
Related Questions in PYTORCH
- Twitter Bootstrap horizontal form elements on a line
- posting javascript populated form to another php page
- Call a method in a .cs from javascript
- HTML5 form input reacting to enter key
- Why HTML5 required field is not mandatory, if form posted to two different pages using JavaScript?
- django form errors before submit
- PHP Contact Form returning error upon submission
- PHP form validation: Where to plop the code
- javascript check input fields are not blank and check input field length?
- Show success or error messages in Ajax response to Wordpress custom registration form
Related Questions in BERT-LANGUAGE-MODEL
- Twitter Bootstrap horizontal form elements on a line
- posting javascript populated form to another php page
- Call a method in a .cs from javascript
- HTML5 form input reacting to enter key
- Why HTML5 required field is not mandatory, if form posted to two different pages using JavaScript?
- django form errors before submit
- PHP Contact Form returning error upon submission
- PHP form validation: Where to plop the code
- javascript check input fields are not blank and check input field length?
- Show success or error messages in Ajax response to Wordpress custom registration form
Related Questions in HUGGINGFACE-TRANSFORMERS
- Twitter Bootstrap horizontal form elements on a line
- posting javascript populated form to another php page
- Call a method in a .cs from javascript
- HTML5 form input reacting to enter key
- Why HTML5 required field is not mandatory, if form posted to two different pages using JavaScript?
- django form errors before submit
- PHP Contact Form returning error upon submission
- PHP form validation: Where to plop the code
- javascript check input fields are not blank and check input field length?
- Show success or error messages in Ajax response to Wordpress custom registration form
Related Questions in LOGITS
- Twitter Bootstrap horizontal form elements on a line
- posting javascript populated form to another php page
- Call a method in a .cs from javascript
- HTML5 form input reacting to enter key
- Why HTML5 required field is not mandatory, if form posted to two different pages using JavaScript?
- django form errors before submit
- PHP Contact Form returning error upon submission
- PHP form validation: Where to plop the code
- javascript check input fields are not blank and check input field length?
- Show success or error messages in Ajax response to Wordpress custom registration form
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
The first value corresponds to label 0 and the second value corresponds to label 1. What BertForSequenceClassification does is feeding the output of the pooler to a linear layer (after a dropout which I will ignore in this answer). Let's look at the following example:
Output:
The pooled_output is a tensor of shape [batch_size,hidden_size] and represents the contextualized (i.e. attention was applied)
[CLS]
token of your input sequences. This tensor is feed to a linear layer to calculate the logits of your sequence:When we normalize these logits we can see that the linear layer predicts that our input should belong to label 1:
Output (will differ since the linear layer is initialed randomly):
The linear layer applies a linear transformation:
y=xA^T+b
and you can already see that the linear layer is not aware of your labels. It 'only' has a weights matrix of size [2,768] to produce logits of size [1,2] (i.e.: first row corresponds to the first value and second row to the second):Output:
The BertForSequenceClassification model learns by applying a CrossEntropyLoss. This loss function produces a small loss when the logits for a certain class (label in your case) deviate only slightly from the expectation. That means the CrossEntropyLoss is the one that lets your model learn that the first logit should be high when the input
does not contain adverse effect
or small when itcontains adverse effect
. You can check this for our example with the following:Output: