What are the differences between adapter tuning and prefix tuning?

1.8k views Asked by At

I am trying to understand the concept of adapter-tuning, prompt-tuning, and prefix-tuning in the context of few-shot learning.

It appears to me that I can apply prompt tuning to a black box language model.

I read for prompt tuning the entire pre-trained language model is frozen. If that's the case prompt tuning could be applied for an OpenAI model like gpt-3 and Codex.

How could I do prompt tuning with OpenAI Codex? I don't find any way so far.

How these techniques are different than in-context example that could be given by few-shot learning.

Can anyone please guide me in the correct direction?

2

There are 2 answers

0
Exploring On BEST ANSWER

These are alternatives to fine-tuning model. They are essentially solutions that reside between few-shot learning and complete fine-tuning of models.

The other answer in this SO post is completely wrong. Fine-tuning has nothing to do with neither prompt tuning nor prefix tuning. These two are completely different techniques than fine-tuning.

Correct reference to prompt tuning and prefix tuning are given below:

  • Prompt Tuning: For prompt tuning k learnable parameter i.e. continuous token embeddings is appended to the input. But the entire pre-trained language model is frozen.

  • Prefix Tuning: For k positions prepended to the input, concatenate additional learnable weights for keys and values at every attention layer. Different to prompt tuning (only learnable input vectors).

Papers that introduced these techniques are given below:

8
mrk On

In my understanding all three concepts mentioned are based on a pre-trained model so in general should work with the GPT model that is molded within OpenAI Codex.

Adapter-tuning involves adding small, task-specific "adapter" modules to the pre-trained model, which can be trained on a few examples to improve performance on the specific task. This is especially interesting in case you want to do task adaptation in my opinion. The idea is to horizontally extend the model by additional layers. You are touching theta.

Prompt-tuning involves providing the model with a few examples of the desired output, along with a prompt indicating the task that the model should perform. You can also read up on this looking for cues or priors. Intuitively this can be understood in guiding the model explicitly. The idea is to add prior knowledge through the input. You are touching x.

Prefix-tuning involves providing the model with a few examples of text inputs, along with a prefix that indicates the task that the model should perform. In my understanding this is basically prompt tuning but focusses on the specifics of natural language processing. The idea is to add prior knowledge through the input. You are touching x.

In their paper on OpenAI Codex they explain how they did fine-tune and adapt their GPT model to the GitHub Data they use for copilot. Read it here.

And this is an open source project which tries to replicate OpenAI Codex - gets pretty close to what you are trying to do, if I understood your comment correctly.