Purpose of the model
The purpose of the model is to build a small scale LLM (No need to be that better as other LLMs) from scratch to understand the concepts of coding an LLM.
Expected Working
The model is just expected to produce meaningful generations (due to resource constraints).
Problem
The problem with this current model is that it was not able to achieve its task. The generations were not at all meaningful either to the input or to the context and I was not able to understand the actual thing that is causing the problem.
Model
My Model - llmwithtransformer
My tries
My first mistake was that I have used the default adam optimizer and crossEntropy loss directly. Since transformers require some modifications in it so I have changed it again with the help of GPT and the available resources from PyTorch and Tensorflow. Although it had made some significant improvement (in generating some random texts which can become a meaningful text with its meaning not at all aligning with the input or context) but still the text is not relational to the input. I was struck here.
Expecting
- The points where I have made mistakes with the corrections (not necessarily the code).
- Improvement Suggestions