I would like to fine tunning a llama2-alpaca model called bode.
I have a web-scrapped dataset of questions and answers and I would like to use it on SFTTrainer to fine tunning that model to this specific domain but I don't know how correctly format the dataset to this model because on hugging face documentation is something like this:
<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>
{{ user_message }} [/INST]
But on this very model datacard, they suggest something like:
Abaixo está uma instrução que descreve uma tarefa. Escreva uma resposta que complete adequadamente o pedido.
### Instrução:
{instruction}
### Resposta:"""
So, is there a method to get via API the prompt used? If so, what kind of modification do I need to do pass it to SFTTrainer?