Over the past decade, the evolution of AI has been deeply marked by the rise of large language models (LLMs), notably powerful innovations such as OpenAI’s GPT series, Meta’s Llama 2, and Google’s PaLM 2. These models have significantly altered our perception of AI technology, stirring intrigue about their capabilities across various industries.

Yet, like newly recruited employees, these models often have a broad but shallow knowledge base. They can perform general tasks effectively, but they fall short when it comes to executing specific tasks tailored to an organization’s needs. Therefore, there is a clear need for fine-tuning these models to enhance their performance in the enterprise environment.

I recently gave a talk entitled “How to fine-tune and customize LLMs” at Snorkel AI’s Enterprise LLM Virtual Summit. I would recommend that you watch the full talk, but I will summarize the talk’s main points here.

Understanding the limitations of large language models

Imagine hiring a new employee. They have a general understanding of their job but lack company-specific knowledge. They can perform straightforward tasks but need carefully tailored training to handle specialized tasks within the organization.

LLMs fit a similar mold. They can grasp and handle complex concepts, making them capable of executing tasks like information extraction or even powering chatbots. But without the right fine-tuning—aka tailored training—they fail to deliver fully on company-specific or industry-specific demands. Their off-the-shelf performance will rarely meet an enterprise’s needs for production deployment.

But, with a little adaptation, LLM performance can exceed that bar and begin generating enterprise value.

Approaches to adapt LLMs

So how do we fine-tune these LLMs? Let’s explore three primary approaches;

  1. Full fine-tuning: This approach is akin to retraining the entire large language model. However, it requires ample data and computational resources. Think of this as taking a competent person off the street and then sending them to college in preparation for a job.
  2. Parameter efficient fine-tuning (PEFT): Think of this as giving your employees additional training. Techniques such as LoRA make it unnecessary to change the underlying structure or weights of a model. Instead, we just add new, smaller, and more efficient adapters.
  3. Distillation: This approach trains a newer, smaller model to replicate the actions and decisions of a larger one. It makes the entire process less data-intensive. This is akin to asking a veteran employee to train a new hire on a very specific task.

Each of these has its place. What makes sense for your enterprise will vary based on your needs and available resources, but cost-efficient, deployment-ready models will usually involve distillation or PEFT.

The key to successful enterprise LLM adoption

At the heart of successful AI model training, we find high-quality, task-specific data—your enterprise’s unique linguistic DNA. In a world where AI training models are widely available and supported (thanks to cloud providers and fantastic resources like Hugging Face) your data creates your key differentiator.

Successful training demands that data teams label, curate, and prepare data for the training process. This can be done through a variety of platforms, but Snorkel Flow provides a one-stop solution for better error analysis, targeted labeling, and collaborations with internal experts. It enables users to improve upon preexisting models and collaborate with subject matter experts for better data optimization.

Case study for LLM fine-tuning

Real-world applications of these approaches have yielded tangible results.

We started one customer project by using clever prompting with GPT-4 and achieved an F1 score of 68.4. This improved on the 49.2 F1 score we achieved with zero-shot prompting alone, but fell well short of the performance bar the client set for production deployment.

We utilized GPT-4 in collaboration with Snorkel Flow’s programmatic labeling suite. We provided engineered prompts for various aspects of each document to generate data labels. However, GPT-4, when prompted, still exhibited some errors. To address this, we employed the platform to establish “Labeling Functions” for additional data labeling on the areas with errors. Subsequently, Snorkel Flow merged the GPT-4 labels with programmatically generated labels for the error sections, resulting in a final probabilistic dataset. In this manner, we combined GPT-4 “teaching” a smaller DistillBERT model with an enhanced curriculum, which is the curated data.

This approach achieved an 84 F1 score—a sure sign of the efficiency of the distillation process.

The future of large language models in your enterprise

Looking ahead, fine-tuning will play an increasingly critical role in AI deployment. Enterprises will likely rely on multiple smaller models (each tailored to handle specific tasks), rather than a single LLM. These specialized smaller models offer better control and efficiency, promising an exciting era of AI-empowered enterprise solutions.

LLMs are incredibly powerful, and they possess the potential to revolutionize AI technology in various industries. They will rarely—if ever—achieve business objectives out of the box. But, with programmatic labeling, fine-tuning, and distillation, LLMs can power robust pipelines that tackle important, high-complexity enterprise use cases with reliable accuracy.

Learn more

If you'd like to learn how the Snorkel AI team can help you develop high-quality LLMs or deliver value to your organization from generative AI, contact us to get started. See what Snorkel can do to accelerate your data science and machine learning teams. Book a demo today.