All articles on
Data development

Alfred: Data labeling with foundation models and weak supervision

Introducing Alfred: an open-source tool for combining foundation models with weak supervision for faster development of academic data sets.

August 27, 2024

New GenAI features, data annotation: Snorkel Flow 2024.R2

This release features new GenAI tools and Multi-Schema Annotation, as well as new enterprise security tools and an updated home page.

August 7, 2024

How data slices transform enterprise LLM evaluation

Enterprises must evaluate LLM performance for production deployment. Custom, automated eval + data slices present the best path to production.

August 1, 2024

Meta’s Llama 3.1 405B is the new Mr. Miyagi, now what?

Meta’s Llama 3.1 405B, rivals GPT-4o in benchmarks, offering powerful AI capabilities. Despite high costs, it can enhance LLM adoption through fine-tuning, distillation, and as an AI judge.

July 25, 2024

Meta’s new Llama 3.1 models are here! Are you ready for it?

Meta released Llama 3 405B today, signaling a new era of open source AI. The model is ready to use on Snorkel Flow.

July 23, 2024

Data-centric AI with Snorkel and MinIO

High-performing AI systems require more than a well-designed model. They also require properly constructed training and testing data.

Weak supervision for non-categorical applications + superalignment

We need more labeled data than ever, so we have explored weak supervision for non-categorical applications—with notable results.

Changho Shin
July 2, 2024

Vision language models: how LLMs boost image classification

Vision language models demonstrate impressive image classification capabilities, but LLMs can help improve their performance. Learn how.

June 12, 2024

How Bonito helps fine-tune specialized LLMs faster than ever

Fine-tuning specialized LLMs demands a lot of time and cost We developed Bonito to make this process faster, cheaper, and easier.

May 28, 2024

Accelerating AI development in manufacturing with Snorkel Flow and AWS SageMaker

The manufacturing industry has experienced a massive influx of data. Snorkel AI and AWS Sage Maker can make that data actionable.

The art of data development for Enterprise LLMs

Snorkel’s Paroma Varma and Google’s Ali Arsenjani discus the role of data in the development and implementation of LLMs.

Dr. Bubbles, Snorkel AI's mascot
April 16, 2024

How Snorkel topped the AlpacaEval leaderboard (and why we’re not there anymore)

Snorkel AI placed a model at the top of the AlpacaEval leaderboard. Here’s how we built it, and how it changed AlpacaEval’s metrics.

Hoang Tran portrayed.
April 9, 2024

CRFM’s HELM and enterprise LLM evaluation beyond accuracy

As Snorkel AI prepares to build better enterprise LLM evaluations, we spoke with Yifan Mail from Stanford’s CRFM HELM project.

vivek krishnamurthy
April 3, 2024

Here’s how Snorkel Flow + Google AI built an enterprise-ready model in a day

Google and Snorkel AI customized PaLM 2 using domain expertise and data development to improve performance by 38 F1 points in a matter of hours.

March 19, 2024

How Skill-it! enables faster, better LLM training

Humans learn tasks better when taught in a logical order. So do LLMs. Researchers developed a way to exploit this tendency called “Skill-it!”

March 12, 2024
Image
See how Snorkel can help you get up to:
100x

Faster Data Curation

40x
Faster Model Delivery
99%
Model Accuracy