Product

How to fine-tune GPT-3.5 Turbo in Snorkel Flow

October 13, 2023
6 min read

Fine-tuning pre-trained large language models like OpenAI’s GPT-3.5 Turbo adapts them to specific tasks, ensuring higher quality results and achieving reliable output formatting. The Snorkel Flow data development platform makes it easy for users to do so, making large language models like GPT-3.5 Turbo work better for their domain and enterprise requirements.

In this guide, we will explore how to fine-tune GPT-3.5 Turbo using Snorkel Flow in two ways:

  • A user-friendly interface for accessing and fine-tuning all available models.
  • A built-in Jupyter notebook for advanced training customization.

Let’s get started.

Why Fine-Tuning

Fine-tuning is crucial for several reasons:

  • Better performance: Use your data to train the model to perform better on your most important tasks.
  • Customized Output Formatting: Adapt the model to match the specific tone required for different use cases.
  • Improved response times: Customized models require fewer tokens in their prompts, allowing the model to arrive at an answer more quickly.
  • Cost optimization: Fine-tuned models achieve better results than generalized models. This reduces prompt engineering and delivers users an acceptable response in fewer attempts, thereby reducing costs.

The cost benefits of fine-tuning OpenAI models come with caveats. OpenAI charges users to build customized model versions and also charges eight times as much per token for users to access customized models. If fine-tuning allows data teams to reduce prompt lengths by 90% (very possible, depending on the prompting techniques used), the tradeoff is clear. Otherwise, the decision may require more nuance.

For detailed information on pricing and safety concerns, please refer to OpenAI’s official documentation. This blog is tailored for GPT-3.5-Turbo, one of OpenAI’s ChatCompletion, instruction-tuned models. Snorkel Flow includes access to all OpenAI models.

If you are interested in Completions models under GPT-3, you can check out this doc by OpenAI.

Steps to fine-tune using Snorkel Flow

To fine-tune GPT-3.5 in Snorkel Flow, follow these steps:

  1. Evaluate the base model on zero-shot learning to establish our baseline performance.
  2. Import the data set and fine-tune the model to adapt it to our targeted tasks and domain.
  3. Analyze predictions from the fine-tuned model to evaluate the impact that fine-tuning had on the model’s performance.

The following sections will investigate each step in detail.

Dataset: Amazon Reviews

This demonstration uses the Amazon Reviews dataset, accessible here. It features high-cardinality text classification with 64 classes, making it a valuable resource for experimenting with text classification.

Step 1: Evaluate the base model on zero-shot learning (ZSL)

Start by evaluating how GPT-3.5 Turbo performs on Amazon Reviews using zero-shot learning. This step establishes the model’s baseline performance and identifies areas that need improvement.

Let’s start with a simple prompt for this approach —a base prompt that describes the text fields and the classification task.

Image5
For each document, Snorkel Flow will insert the appropriate variables into the areas highlighted in green. The prompt interface allows you to experiment and refine your prompts as needed.

You can easily register your first prompt for evaluation in Snorkel Flow. The current ZSL prompt achieves 51% accuracy and 43 f1 macro score. This is decent for zero-shot prompting on a high-cardinality problem, but far from deployable.  

Image1

We could coax better results with prompting techniques, but that’s outside the scope of this post. While Snorkel Flow users must provide a good base prompt, they will not need to perform any complicated prompt engineering. The performance improvement achieved by fine-tuning will also outstrip those achievable by customizing prompt templates.

Step 2: Labeling and fine-tuning

After uploading data to Snorkel Flow, you can easily fine-tune a personalized version of GPT-3.5 Turbo through the UI and customize training parameters as needed.

You can perform this fine-tuning pipeline in two ways:

  • Using your current labeled data directly.
  • With a data set sharpened and refined through manual labeling or with Snorkel’s proprietary programmatic labeling tools.

For simplicity, we fine-tune GPT-3.5-Turbo on a subset of 300 data points and observe the model.

The Snorkel UI provides a simple, no-code interface for training purposes.

Image2

After fine-tuning with this small subset of data, we already see a jump in performance—with no prompt engineering and very little development effort.

Image3

The platform also accommodates advanced users with high customization requirements through the built-in Jupyter Hub interface. Snorkel Flow users can export the labeled data and fine-tune GPT-3.5 Turbo via OpenAI’s APIs in a standard notebook.

Image6

Snorkel also supplies customers with more comprehensive code notebooks with best practices and further customization where needed.

Step 3: Analyze Predictions and Iterate

If the model’s performance needs further enhancement after fine-tuning, you can use Snorkel’s analysis workflow to analyze the predictions and further improve the model through:

  • Hyperparameter tuning: Such as changing the batch size, learning rate, epochs, etc.
  • Adding more data: Increase the volume of training data through manual addition or by creating Labeling Functions (LFs) in Snorkel Flow. Then, users can fine-tune a new version of the model. Our error analysis tools help identify slices of data that require more attention, and Snorkel’s technology can help address those slices’ shortcomings.
  • Adding guardrails and post-process predictions: Ensure the model’s safety and reliability, and post-process the predictions for optimal results.

The Snorkel Flow platform offers all of these options natively.

If you would like a more detailed look at how you can iterate on prompts and data, fine-tune, and distill for a production use case, read this blog.

Image4
Snorkel Flow’s Clarity Matrix provides pointers on the accuracy of labeling functions.

Fine-tuning GPT-3.5 Turbo is easy on Snorkel Flow

In Snorkel Flow, users can evaluate the  GPT-3.5 Turbo base model’s performance, label data, fine-tune, and analyze predictions—all within the user interface. More advanced users can move their workflow into the on-platform notebooks to achieve any additional customization they need.

All of this makes Snorkel Flow a comprehensive platform for handling all aspects of LLM fine-tuning, enabling users to build specialized and high-quality models effectively. By leveraging Snorkel Flow, teams can ensure that their models are well-adapted to specific tasks and provide reliable and consistent results in real-world applications.

Ready to accelerate AI development?

Deploy production AI and ML applications 10-100x faster with Snorkel’s experts, using our proprietary technology.

Request a demo

Share this article
Hoang Tran portrayed.
Hoang Tran
Senior Machine Learning Engineer

Hoang Tran is a Senior Machine Learning Engineer at Snorkel AI, where he leverages his expertise to drive advancements in AI technologies. He also serves as a Lecturer at VietAI, sharing his knowledge and mentoring aspiring AI professionals. Previously, Hoang worked as an Artificial Intelligence Researcher at Fujitsu and co-founded Vizly, focusing on innovative AI solutions. He also contributed as a Machine Learning Engineer at Pictory.

Hoang holds a Bachelor’s degree in Computer Science from Minerva University, providing a solid foundation for his contributions to the field of artificial intelligence and machine learning.

Connect with Hoang to discuss AI research, machine learning projects, or opportunities in education and technology.

Recommended articles

View all articles
agentic-in-action
The Standard for Agents You Can Trust: Lessons from the Federal Front Lines
In the first installment of Agentic in Action — a series about real AI deployments, not demos — Snorkel AI’s Kevin Olivieri sat down with three people who have spent their careers where trust isn’t optional: Chris Sniffen, Federal Applied AI Lead at Snorkel AI; John Hickey, President of August Schell; and Mike Baca, CIO of August Schell. The conversation focused on
June 5, 2026
Snorkel Team
collab-gym-thumbnail
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
At our latest Snorkel AI Reading Group, Yijia Shao (Stanford NLP) stopped by our San Francisco office to present Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration. As LLM agents get better at automating tasks on their own, a large class of real-world problems still needs a human in the loop – for their preferences, their domain expertise, or simply for control.
June 4, 2026
Alexis Sobel
Image
Benchtalks #2: The future of coding benchmarks
For our second Benchtalks, the series dedicated to the researchers building the measurement toolkits that frontier labs hill-climb on, Snorkel AI co-founder Vincent Sunn Chen sat down with John Yang, a Stanford PhD student and creator of the SWE-bench franchise, SWE-smith, CodeClash, and most recently ProgramBench. Highlights More on ProgramBench: See the benchmark and the upcoming leaderboard at programbench.com. More from John Yang: Publications and writing at john-b-yang.github.io. Snorkel
June 3, 2026
Vincent Sunn Chen
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.
By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.