On-demand webinar

Speakers

Rebekah Westerlind

Software Engineer
Snorkel AI

Rebekah Westerlind is a full-stack software engineer at Snorkel AI on the product engineering team. She graduated from Cornell University in 2022 with degrees in Computer Science and Operations Research & Information Engineering. Driven by a desire to always be learning, Rebekah loves jumping in on new projects and surrounding herself with experts.

Vincent Sunn Chen

Research Fellow & Founding Team
Snorkel AI

Vincent Sunn Chen is a Research Fellow on the founding team at Snorkel AI. His work centers on systems for high quality AI evaluation & data development with experts in the loop. He currently leads the Open Benchmarks Grants, a $3M commitment to funding benchmarks and infrastructure for frontier agents. Prior to Snorkel, Vincent was a researcher at the Stanford AI Lab, where he studied the foundations of data-centric AI systems.

How to evaluate LLM accuracy for domain-specific use cases

LLM evaluation is critical for generative AI in the enterprise, but measuring how well an LLM answers questions or performs tasks is difficult. Thus, LLM evaluations must go beyond standard measures of “correctness” to include a more nuanced and granular view of quality.

In practice, enterprise LLM evaluations (e.g., OSS benchmarks) often come up short because they’re slow, expensive, subjective, and incomplete. They leave AI initiatives blocked because there is no clear path to production quality.

In this webinar, Vincent Sunn Chen, Founding Engineer at Snorkel AI, and Rebekah Westerlind, Software Engineer at Snorkel AI, discuss the importance of LLM evaluation, highlight common challenges and approaches, and explain the core concepts behind Snorkel AI’s approach to data-centric LLM evaluation.

Watch this on demand webinar to learn more about:

The nuances of LLM evaluation
How to evaluate LLM response accuracy at scale
Identifying where additional LLM fine-tuning is needed

Schedule

Tuesday, March 12, 2024