Some Terms for LLMs:
- Foundation Models: Large pre-trained models that can be fine-tuned for specific tasks. Examples include GPT-3, BERT, and T5.
- Tokens: Numerical respresentations of words or parts of wrods.
- embeddings: Dense vector representations of tokens that capture their semantic meaning.
- Top-p sampling: Threshold probability for token inclusion. higher -> more random.
- Top-k sampling: K candidates with highest probabilities are considered for sampling. higher -> more random.
- Temperature: The level of randomness in selecting the next word in the output from those tokens.
- Context window: The number of tokens the model can consider at once. Longer context windows allow for more coherent and contextually relevant responses.
- Max tokens: Limit for total number of tokens for input or output.
Transfer Learning (Fine-tuning) with Transformers
- We add additional training data throught the whole thing.
- We often freeze specific layers and re-train others: Train a new tokenizer to learn a new language.
- Add a layer on top of the pre-trained model:
- A few layer may be all that’s needed.
- Provide examples of prompts and desired completions.
- Adapt it to classification or otehr tasks.