Some Terms for LLMs:

Foundation Models: Large pre-trained models that can be fine-tuned for specific tasks. Examples include GPT-3, BERT, and T5.
Tokens: Numerical respresentations of words or parts of wrods.
embeddings: Dense vector representations of tokens that capture their semantic meaning.
Top-p sampling: Threshold probability for token inclusion. higher -> more random.
Top-k sampling: K candidates with highest probabilities are considered for sampling. higher -> more random.
Temperature: The level of randomness in selecting the next word in the output from those tokens.
Context window: The number of tokens the model can consider at once. Longer context windows allow for more coherent and contextually relevant responses.
Max tokens: Limit for total number of tokens for input or output.

Transfer Learning (Fine-tuning) with Transformers

We add additional training data throught the whole thing.
We often freeze specific layers and re-train others: Train a new tokenizer to learn a new language.
Add a layer on top of the pre-trained model:
1. A few layer may be all that’s needed.
2. Provide examples of prompts and desired completions.
3. Adapt it to classification or otehr tasks.

Transfer Learning (Fine-tuning) with Transformers#