LLMs

How to customize foundation models

Learn about the different customization techniques for foundation models and LLMs with their advantages and recommended use cases.
In: LLMs

Today in this post I will cover the different customization techniques for foundation models with their advantages and recommended use cases.

Here's the content for today’s issue:

  • When to tune a model?
  • Zero-Shot Prompting
  • One-Shot Prompting
  • Few-Shot Prompting
  • Fine-Tuning
  • Parameter-Efficient Fine-Tuning

Let's do this! 💪


When to tune a model?

Always start with prompt engineering using the largest LLM suited for your task. This provides a signal on whether the task can be addressed by LLMs. Experiment with prompt formats and labeled examples to determine optimal approaches.

Fine-tuning may be motivated by:

  1. Improving performance over prompt engineering alone by training on ample labeled data.
  2. Reducing deployment costs by tuning a smaller model to match bigger models' performance.

The decision depends partly on the cost of acquiring labeled data for fine-tuning.

When to tune a model

There are multiple techniques to customize a model that I am going to describe in the following sections in detail.


TL;DR

Foundation models like GPT-4 or Llama 2 have shown impressive capabilities, but they still need customization for optimal performance on specific tasks. There are a few main techniques for customizing foundation models: zero-shot prompting, few-shot prompting, and data-driven tuning. Here’s a table with a summary of those techniques:

Summary of Techniques to Customize Foundation Models

Zero-Shot Prompting

Zero-shot prompting involves providing a natural language prompt to the model to get it to generate the desired output, without any additional training data. This leverages the model's pre-training on a huge corpus of text data.

For example, to get a summary from GPT-4, we could provide this prompt:

"Provide a one-paragraph summary of the following passage: [insert passage here]"

The key is crafting prompts that clearly explain the task and format you want the model to follow. Well-designed prompts can work remarkably well without any extra training data.


One-Shot Prompting

One-shot prompting involves providing a single example to an AI model along with a prompt, in order to demonstrate the desired behavior. This technique can teach complex behaviors with minimal data.

For instance, if I wanted the model to write marketing copy with an enthusiastic, energetic tone, I could provide this example passage:

"Wake up your workouts with the unbeatable energy of WorkoutFuel protein shakes! Our delicious flavors give you the kick you need to push your limits. Drink WorkoutFuel before your next session and feel the difference as you power through your reps. With WorkoutFuel, no workout is out of reach!"

This shows the model the type of over-the-top, high-energy style I want. I would then give it the prompt:

"Write marketing copy for WorkoutFuel protein shakes in an enthusiastic, punchy voice."

After seeing the example, the model can generate a new copy with a similar tone tailored to the given prompt. While not as robust as techniques that use more training data, one-shot prompting allows conveying nuanced styles from just one well-chosen demonstration.


Few-Shot Prompting

Few-shot prompting provides a model with just a few examples to establish the pattern you want it to follow. This primes the model before generating the final outputs.

For example, to generate meeting transcript summaries, you could provide 2-3 examples, then ask a model like Llama 2 to generate new summaries following that style. The few examples help establish the desired structure, language patterns, etc.

IBM watsonx.ai Prompt Lab helps apply multiple prompting techniques without coding

Few-shot learning requires more data than zero-shot prompting but is still very sample-efficient compared to traditional training approaches. Just a handful of examples can steer the model significantly.


Data-Driven Tuning

For optimal customization, you can fine-tune foundation models on datasets specific to your task. This adjusts the models' weights to specialize on your data distribution and objectives.

Scenarios where fine-tuning shines:

  • Customizing stylistic aspects like tone, voice, and formatting
  • Boosting reliability when a very specific output is needed
  • Addressing complex prompts the base model struggles with
  • Accommodating many niche edge cases in a tailored way
  • Mastering new skills not easily codified into a text prompt

For example, you could take a dataset of customer support emails and fine-tune GPT-3.5 on it. This would customize the model to generate high-quality responses tailored to your business needs.

Parameter-efficient fine-tuning (PEFT) of large-scale pre-trained models

A new method for fine-tuning large language models (LLMs) is called "delta tuning." Delta tuning is a parameter-efficient method that only updates a small subset of the LLM's parameters while keeping the rest of the parameters fixed. This makes delta tuning much faster and cheaper than traditional fine-tuning methods, which update all of the LLM's parameters.

PEFT achieved comparable or better performance than traditional fine-tuning methods. PEFT was also found to be more efficient and scalable than traditional fine-tuning methods.

Here are some of the key benefits of delta tuning:

  • Parameter efficiency: Delta tuning only updates a small subset of the LLM's parameters, which makes it much faster and cheaper than traditional fine-tuning methods.
  • Performance: Delta tuning achieves comparable or better performance than traditional fine-tuning methods on a wide range of NLP tasks.
  • Scalability: Delta tuning is more scalable than traditional fine-tuning methods, making it possible to fine-tune LLMs on larger datasets and more complex tasks.

Delta tuning is a promising new method for fine-tuning LLMs that has the potential to make LLMs more accessible and useful for a wider range of applications.

Fine-tuning vs Parameter-efficient fine-tuning (PEFT)
PEFT approaches enable you to get good enough performance compared to full fine-tuning at a fraction of the computational costs. It is the preferred approach when constrained for labelled data and/or compute resources.

There are multiple PEFT techniques. The HuggingFace PEFT library supports multiple methods such as:

  • Prefix Tuning: Prefix tuning uses soft prompts - a vector with free parameters is attached to the input embedding that we train while keeping the pre-trained LLM frozen. In prefix tuning, vectors are added at each transformer layer.
  • Prompt Tuning: Prompt tuning is a simpler variant of prefix tuning where the vector is prepended only at the input layer.
  • P-Tuning: P-Tuning is a variant of prompt tuning. It is a method for automatically searching and optimizing for better prompts in a continuous space using an LSTM model. Empirically demonstrated to work well across different model scales (300M to 10B)
  • LoRA: Low-Ran Adaptation (LoRA) adds pairs of rank-decomposition weight matrices (called update matrices) to existing weights, and only trains those newly added weights.
PEFT techniques: Prefix Tuning, Prompt Tuning, P-Tuning, LoRA

In summary, zero-shot and few-shot prompting enable fast customization with minimal data, while data-driven tuning provides deeper specialization when you have sufficient training examples.

Combining prompting with tuning can yield optimized foundation models for your use case. The key is choosing the right customization approach based on your goals and available data.

I'm excited to see the creative ways you will leverage these techniques to customize models for your needs! 🚀


Join the AI Bootcamp! 🤖

Pre-enroll to 🧠 The AI Bootcamp. It is FREE!

50 videos: from basics to advanced concepts and demos. I explain everything step-by-step

10 Practice Exercises: Learn by doing. Train and customize your first models. No technique experience is required.

Learn in Community: be part of the private group and collaborate with your peers.

Cheers!

Armand

Written by
Armand Ruiz
I'm a Director of Data Science at IBM and the founder of NoCode.ai. I love to play tennis, cook, and hike!
More from AI with Armand

Mistral.ai: Crafting a New Path in AI

Learn all about Mistral AI, the french startup challenging OpenAI and advancing LLMs with a small and creative team. Understand the nature of the team, their roadmap, and business opportunities.

Accelerate your journey to becoming an AI Expert

Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to AI with Armand.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.