Snorkel AI held its Enterprise LLM Virtual Summit on October 26, 2023, drawing an engaged crowd of more than 1,000 attendees across three hours and eight sessions that featured 11 speakers. Luminaries from Snorkel AI, Contextual AI, Google, Meta, Stanford University, and Together AI addressed the current reality of large language models (LLMs) in the enterprise, and where they can go in the near future.
The day’s main takeaway? Generic LLMs are becoming the new baseline for modern enterprise AI, and regardless of where you are in adopting LLMs, AI data development is essential to deliver the performance and reliability enterprises expect for their high-value, mission-critical AI use cases.
Here are a few recommendations on how enterprises can develop and deploy LLMs with confidence.
- Instead of exclusively relying on a singular data development technique, leverage a variety of techniques such as promoting, RAG, and fine-tuning for the most optimal outcome.
- Focus on improving data quality and transforming manual data development processes into programmatic operations to scale fine-tuning.
- LLMs are large and expensive to operate in production. Instead, distill them into smaller, specialized models using data-centric workflows.
A brief overview of each session and the recordings follow.
Fireside chat with Percy Liang and Alex Ratner
Snorkel AI Co-founder and CEO Alex Ratner chatted with Percy Liang, Co-founder of Together AI and Director at the Stanford Center for Research on Foundation Models (CRFM). The pair discussed the application and utility of foundation models (a term Liang’s group coined) as well as the paradigm shift that the models represent. Foundation models (FMs), the two noted, allow data scientists to train large models on enormous datasets for use in downstream applications.
Liang also discussed his group’s HELM project (Holistic Evaluation of Language Models), which he and his collaborators designed as a standardized way to evaluate language or multimodal models. Standardized evaluation, the pair said, plays an increasingly important role in model development and customization; it enables developers to find, understand, and reinforce individual models’ weaknesses.
Fireside chat with Douwe Keila and Alex Ratner
In a second fireside chat, Alex spoke with Douwe Keila, CEO of Contextual AI and an author of the original paper about retrieval augmented generation (RAG). Their conversation touched on the applications and misconceptions of RAG, the future of AI in the enterprise, and the roles of data and evaluation in improving AI systems.
The pair discussed how enterprises need more specialized, fine-tuned models for more complex tasks. They also emphasized the importance of data in getting the most out of AI systems—suggesting that accurate and specific data curation and development can significantly improve system performance. Keila closed the discussion by predicting that researchers will make significant breakthroughs in multimodal models in the near future.
Accelerating production-ready GenAI
Vincent Chen, director of product (technical) at Snorkel AI, and Joe Spisak, director of product management (generative AI) for Meta, discussed progress, challenges, and opportunities with AI and machine learning in the enterprise. They highlighted the importance of model quality and cost and acknowledged that off-the-shelf AI solutions often fall short of enterprises’ needs—necessitating a trend toward specialist models.
Their conversation also touched on the need for proper evaluation, curation, and adaptation of models, the importance of an open-source community, and the trend of validate-then-build in the development cycle. Despite progress, they said the field faces meaningful challenges, including a lack of clear evaluation tools, undefined performance benchmarks, and high deployment costs.
Art of data development for LLMs
Paroma Varma, co-founder at Snorkel AI, and Ali Arsenjani, director of AI/ML partner engineering at Google, discussed the role of data in the development and implementation of LLMs. These models don’t work out of the box, they noted, particularly for enterprise applications. Customization is crucial, and injecting unique organizational data (their ‘secret sauce’) into these models can help tailor them for specific use cases.
The conversation included notes about how much time data scientists spend preparing data, and the responsibility that comes with using AI. But Ali and Paroma focussed heavily on the need for specialized models for different verticals or sub-verticals.
How to fine-tune and customize LLMs
Hoang Tran, ML Engineer at Snorkel AI, outlined how he saw LLMs creating value in enterprise environments. While LLMs have broad capabilities, he said, they often lack the depth of knowledge needed for specific organizational tasks. Fine-tuning and customization—through full fine-tuning, parameter-efficient fine-tuning, or distillation—can overcome these shortcomings.
The success of these models, he said, largely depends on high-quality, task-specific data. Echoing others at the summit, he suggested that the future of AI in enterprises may tend toward a preference for many smaller, task-specific models over a single LLM.
Scalable AI data development for instruction tuning: A case study with RedPajama
Chris Glaze, a research scientist at Snorkel AI, told the story of how his team made fine-tuning LLMs more efficient and scalable using machine learning.
The team used programmatic labeling on Snorkel Flow to rapidly develop two ‘guiding’ models. The first categorizes instructions, while the second assesses the quality of responses. Using these models, the team quickly curated the 20,000 prompt and response pairs used to fine-tune the RedPajama LLM down to the best 10,000. They then fine-tuned their own version, which human test subjects preferred over the baseline model in every measured category.
Better not Bigger: Distilling LLMs into specialized models
Jason Fries, a research scientist at Snorkel AI and Stanford University, discussed the challenges of deploying LLMs and presented two variations of one solution: distillation.
The first solution, called “distilling step-by-step” emerged from a collaboration between researchers at Snorkel AI and Google Research. This approach prompts an LLM to give an answer to a question along with the model’s reasoning behind its answer. Data scientists then use both the answer and the rationale to train a smaller model. In experiments, this allowed researchers to train models on much less data while maintaining similar performance.
Jason also showed how the Snorkel Flow data development platform allows users to effectively distill the expertise of multiple LLMs into a deployable, small-format model.
Live Q&A with select panelists
In a lively session, eight of the day’s panelists (Dr. Ali Arsenjani, Joe Spisak, Alex Ratner, Paroma Varma, Jason Fries, Vincent Chen, Chris Glaze, and Hoang Tran) returned to respond live to questions curated from the audience.
The audience’s questions ranged from how much data developers really need to fine-tune LLMs, how companies can ensure the security of their proprietary data while training LLMs, and how the panelists keep up-to-date with the fast-moving field of LLMs. The panelists agreed that the best way to keep up to date with LLMs is to have smart friends engaged in the field.
If you'd like to learn how the Snorkel AI team can help you develop high-quality LLMs or deliver value to your organization from generative AI, contact us to get started. See what Snorkel can do to accelerate your data science and machine learning teams. Book a demo today.