Fine-tuning pre-trained large language models like OpenAI’s GPT-3.5 Turbo adapts them to specific tasks, ensuring higher quality results and achieving reliable output formatting. The Snorkel Flow data development platform makes it easy for users to do so, making large language models like GPT-3.5 Turbo work better for their domain and enterprise requirements.

In this guide, we will explore how to fine-tune GPT-3.5 Turbo using Snorkel Flow in two ways:

  • A user-friendly interface for accessing and fine-tuning all available models.
  • A built-in Jupyter notebook for advanced training customization.

Let’s get started.

Why Fine-Tuning

Fine-tuning is crucial for several reasons:

  • Better performance: Use your data to train the model to perform better on your most important tasks.
  • Customized Output Formatting: Adapt the model to match the specific tone required for different use cases.
  • Improved response times: Customized models require fewer tokens in their prompts, allowing the model to arrive at an answer more quickly.
  • Cost optimization: Fine-tuned models achieve better results than generalized models. This reduces prompt engineering and delivers users an acceptable response in fewer attempts, thereby reducing costs.

The cost benefits of fine-tuning OpenAI models come with caveats. OpenAI charges users to build customized model versions and also charges eight times as much per token for users to access customized models. If fine-tuning allows data teams to reduce prompt lengths by 90% (very possible, depending on the prompting techniques used), the tradeoff is clear. Otherwise, the decision may require more nuance.

For detailed information on pricing and safety concerns, please refer to OpenAI’s official documentation. This blog is tailored for GPT-3.5-Turbo, one of OpenAI’s ChatCompletion, instruction-tuned models. Snorkel Flow includes access to all OpenAI models.

If you are interested in Completions models under GPT-3, you can check out this doc by OpenAI.

Steps to fine-tune using Snorkel Flow

To fine-tune GPT-3.5 in Snorkel Flow, follow these steps:

  1. Evaluate the base model on zero-shot learning to establish our baseline performance.
  2. Import the data set and fine-tune the model to adapt it to our targeted tasks and domain.
  3. Analyze predictions from the fine-tuned model to evaluate the impact that fine-tuning had on the model’s performance.

The following sections will investigate each step in detail.

Dataset: Amazon Reviews

This demonstration uses the Amazon Reviews dataset, accessible here. It features high-cardinality text classification with 64 classes, making it a valuable resource for experimenting with text classification.

Step 1: Evaluate the base model on zero-shot learning (ZSL)

Start by evaluating how GPT-3.5 Turbo performs on Amazon Reviews using zero-shot learning. This step establishes the model’s baseline performance and identifies areas that need improvement.

Let’s start with a simple prompt for this approach —a base prompt that describes the text fields and the classification task.

For each document, Snorkel Flow will insert the appropriate variables into the areas highlighted in green. The prompt interface allows you to experiment and refine your prompts as needed.

You can easily register your first prompt for evaluation in Snorkel Flow. The current ZSL prompt achieves 51% accuracy and 43 f1 macro score. This is decent for zero-shot prompting on a high-cardinality problem, but far from deployable.  


We could coax better results with prompting techniques, but that’s outside the scope of this post. While Snorkel Flow users must provide a good base prompt, they will not need to perform any complicated prompt engineering. The performance improvement achieved by fine-tuning will also outstrip those achievable by customizing prompt templates.

Step 2: Labeling and fine-tuning

After uploading data to Snorkel Flow, you can easily fine-tune a personalized version of GPT-3.5 Turbo through the UI and customize training parameters as needed.

You can perform this fine-tuning pipeline in two ways:

  • Using your current labeled data directly.
  • With a data set sharpened and refined through manual labeling or with Snorkel’s proprietary programmatic labeling tools.

For simplicity, we fine-tune GPT-3.5-Turbo on a subset of 300 data points and observe the model.

The Snorkel UI provides a simple, no-code interface for training purposes.


After fine-tuning with this small subset of data, we already see a jump in performance—with no prompt engineering and very little development effort.


The platform also accommodates advanced users with high customization requirements through the built-in Jupyter Hub interface. Snorkel Flow users can export the labeled data and fine-tune GPT-3.5 Turbo via OpenAI’s APIs in a standard notebook.


Snorkel also supplies customers with more comprehensive code notebooks with best practices and further customization where needed.

Step 3: Analyze Predictions and Iterate

If the model’s performance needs further enhancement after fine-tuning, you can use Snorkel’s analysis workflow to analyze the predictions and further improve the model through:

  • Hyperparameter tuning: Such as changing the batch size, learning rate, epochs, etc.
  • Adding more data: Increase the volume of training data through manual addition or by creating Labeling Functions (LFs) in Snorkel Flow. Then, users can fine-tune a new version of the model. Our error analysis tools help identify slices of data that require more attention, and Snorkel’s technology can help address those slices’ shortcomings.
  • Adding guardrails and post-process predictions: Ensure the model’s safety and reliability, and post-process the predictions for optimal results.

The Snorkel Flow platform offers all of these options natively.

If you would like a more detailed look at how you can iterate on prompts and data, fine-tune, and distill for a production use case, read this blog.

Snorkel Flow’s Clarity Matrix provides pointers on the accuracy of labeling functions.

Fine-tuning GPT-3.5 Turbo is easy on Snorkel Flow

In Snorkel Flow, users can evaluate the  GPT-3.5 Turbo base model’s performance, label data, fine-tune, and analyze predictions—all within the user interface. More advanced users can move their workflow into the on-platform notebooks to achieve any additional customization they need.

All of this makes Snorkel Flow a comprehensive platform for handling all aspects of LLM fine-tuning, enabling users to build specialized and high-quality models effectively. By leveraging Snorkel Flow, teams can ensure that their models are well-adapted to specific tasks and provide reliable and consistent results in real-world applications.

Learn more

If you'd like to learn how the Snorkel AI team can help you develop high-quality LLMs or deliver value to your organization from generative AI, contact us to get started. See what Snorkel can do to accelerate your data science and machine learning teams. Book a demo today.