Some Terms for LLMs:

  • Foundation Models: Large pre-trained models that can be fine-tuned for specific tasks. Examples include GPT-3, BERT, and T5.
  • Tokens: Numerical respresentations of words or parts of wrods.
  • embeddings: Dense vector representations of tokens that capture their semantic meaning.
  • Top-p sampling: Threshold probability for token inclusion. higher -> more random.
  • Top-k sampling: K candidates with highest probabilities are considered for sampling. higher -> more random.
  • Temperature: The level of randomness in selecting the next word in the output from those tokens.
  • Context window: The number of tokens the model can consider at once. Longer context windows allow for more coherent and contextually relevant responses.
  • Max tokens: Limit for total number of tokens for input or output.

Transfer Learning (Fine-tuning) with Transformers

  • We add additional training data throught the whole thing.
  • We often freeze specific layers and re-train others: Train a new tokenizer to learn a new language.
  • Add a layer on top of the pre-trained model:
    1. A few layer may be all that’s needed.
    2. Provide examples of prompts and desired completions.
    3. Adapt it to classification or otehr tasks.