Research Archives

Which is better, retrieval augmentation (RAG) or fine-tuning? Both.

Professionals in the data science space often debate whether RAG or fine-tuning yields the better result. The answer is “both.”

Data-Centric AI, Fine-Tuning, Foundation Models, LLMs, NLP, RAG

Hoang Tran

September 20, 2023

Former U.S. Chief Data Scientist on past and future of data science

Past U.S. Chief Data Scientist DJ Patil talked with Snorkel AI CEO Alex Ratner on topics including the origin of the title “data scientist.”

Data Development, Data-Centric AI, Foundation Models, LLMs, MLOps, NLP, Partners, Public Sector

Team Snorkel

September 12, 2023

4 new papers show foundation models can build on themselves

The surest way to improve foundation models is through more and better data, but Snorkel researchers showed FMs can learn from themselves.

Data-Centric AI, Fine-Tuning, Foundation Models, NLP

Fred Sala

August 31, 2023

Accelerating predictive task time to value with generative AI

Generative AI can write poems, recite common knowledge, and extract information. GenAI can also help quickly build predictive pipelines.

Annotation, Data Development, Data Labeling, Fine-Tuning, Foundation Models, LLMs, NLP, Retail & Ecommerce

Bradley Fowler

August 17, 2023

Getting better performance from foundation models (with less data)

Annotation, Data Development, Foundation Models, NLP

Fred Sala

August 4, 2023

Data fuels enterprise AI value: 6 takeaways from the Gartner Hype Cycle for Artificial Intelligence, 2023

GenAI may be the most transformative technology of the past decade but data is where enterprises are able to realize real value from AI today.

Data Development, Data Labeling, Data-Centric AI, Fine-Tuning, Foundation Models, NLP

Matt Casey

August 2, 2023

How we built better GenAI with programmatic data development

We used weak supervision to programmatically curate instruction tuning data for open-source LLMs to build a better GenAI.

Data Development, Data Labeling, Data-Centric AI, Evaluation, Fine-Tuning, Foundation Models, LLMs, NLP, Product Releases

Chris Glaze

July 19, 2023

The future of large language models is faster and more robust

Snorkel and affiliated academic labs have been hard at work reducing how computationally expensive large language models are.

Data Development, Data-Centric AI, Evaluation, Foundation Models, NLP

Fred Sala

June 29, 2023

LLMs high priority for enterprise data science, but concerns remain

Enterprises—especially the world’s largest—are excited to use large language models, but they want to fine-tune them on proprietary data.

Data Labeling, Fine-Tuning, Foundation Models, LLMs, NLP

Matt Casey

June 23, 2023

How MLCommons is democratizing data with public datasets

Peter Mattson, Google senior staff engineer and president of MLCommons.org, explained MLCommons at The Future of Data-Centric AI in 2022.

Data Development, Data Labeling, Data-Centric AI, Evaluation, Foundation Models, LLMs, NLP

Team Snorkel

May 31, 2023

Large language models: their history, capabilities and limitations

Large language models have enormous potential. But what are they? Where did they come from? And how can you make them work better?

Fine-Tuning, Foundation Models, LLMs, NLP

Matt Casey

May 25, 2023

Stanford professor on data-centric AI for healthcare and medicine

Stanford assistant professor James Zou, presents “Responsible Data-Centric AI for Healthcare and Medicine” at The Future of Data-Centric AI.

Computer vision, Data Labeling, Data-Centric AI, Evaluation, Fine-Tuning, Healthcare

Team Snorkel

May 18, 2023

Poster presenters compete to win desktop GPU

Snorkel AI has accepted the first batch of applications for its first annual virtual poster competition. But there’s still time to add yours to the mix.

Data Development, Data Labeling, Data-Centric AI, NLP

Matt Casey

May 9, 2023

Use your data to build your AI moat: The Future of Data-Centric AI 2023

Join us on June 7-8 to learn how to use your data to build your AI moat at The Future of Data-Centric AI 2023 free virtual conference.

Data Development, Data Labeling, Data-Centric AI, Foundation Models, LLMs, NLP, Synthetic Data

Devang Sachdev

May 4, 2023

Out of distribution blindness: why to fix it and how energy can help

Sharon Li is an assistant professor at the University of Wisconsin-Madison. She presented “Detecting Data Distributional Shift: Challenges and Opportunities” at Snorkel AI’s The Future of Data-Centric AI Summit in 2022. The talk covered a novel approach for handling out-of-distribution objects.

Computer vision, Data-Centric AI, Evaluation, Fine-Tuning

Team Snorkel

May 3, 2023

All articles on
Research

Which is better, retrieval augmentation (RAG) or fine-tuning? Both.

Former U.S. Chief Data Scientist on past and future of data science

4 new papers show foundation models can build on themselves

Accelerating predictive task time to value with generative AI

Getting better performance from foundation models (with less data)

Data fuels enterprise AI value: 6 takeaways from the Gartner Hype Cycle for Artificial Intelligence, 2023

How we built better GenAI with programmatic data development

The future of large language models is faster and more robust

LLMs high priority for enterprise data science, but concerns remain

How MLCommons is democratizing data with public datasets

Large language models: their history, capabilities and limitations

Stanford professor on data-centric AI for healthcare and medicine

Poster presenters compete to win desktop GPU

Use your data to build your AI moat: The Future of Data-Centric AI 2023

Out of distribution blindness: why to fix it and how energy can help

Product

Solutions

Services

Industries

Customers

Resources

Learn

Engage

AI Primers

Docs

AI Research

Company

Contact

Compliance

All articles on Research

Product

Solutions

Services

Industries

Customers

Resources

Learn

Engage

AI Primers

Docs

AI Research

Company

Contact

Compliance

All articles on
Research