i am currently using GPT-3 and i am trying to compare its capabilities to related language models for my masters thesis. Unfortunatly GPT-3 is an API based application, so i am not really able to extract metrics such as perplexity.
Over the API i have acces to these three metrics and of course the models outputs:
- training_loss: loss on the training batch 
- training_sequence_accuracy: the percentage of completions in the training batch for which the model's predicted tokens matched the true completion tokens exactly. For example, with a batch_size of 3, if your data contains the completions [[1, 2], [0, 5], [4, 2]] and the model predicted [[1, 1], [0, 5], [4, 2]], this accuracy will be 2/3 = 0.67 
- training_token_accuracy: the percentage of tokens in the training batch that were correctly predicted by the model. For example, with a batch_size of 3, if your data contains the completions [[1, 2], [0, 5], [4, 2]] and the model predicted [[1, 1], [0, 5], [4, 2]], this accuracy will be 5/6 = 0.83 
Is there any possibility to calculate the perplexity of my model using python?
Thank you.