Applied AI
Customers

How Genentech extracted information for clinical trial analytics with Snorkel Flow

February 26, 2022
3 min read

Genentech, a global biotech leader and member of the Roche Group, leveraged Snorkel Flow to extract critical information from lengthy clinical trial protocol (CTP) pdf documents. They built AI applications that used NER, entity linking, text extraction, and classification models to determine inclusion/ exclusion criteria and to analyze Schedules of Assessments. Genentech’s team achieved 95-99% model accuracy by using Snorkel Flow.

Unlocking the value of clinical trial protocol data

Scientists at Genentech and other life sciences companies write and perform thousands of Clinical Trial Protocols (CTPs) every year. These CTPs are complex documents that describe the plan for a clinical trial, including the objectives, the methodologies, and the population for the trial. There’s a lot of useful information in these CTPs that study design teams can reuse to reduce trial times and costs, increase recruitment of diverse patient populations and reduce the dropout rate of patients in a trial. If study teams have access to this data, their net outcome is a reduction in cost for drug development.

Clinical Trial Protocol (CTP) pdf documents

Genentech replaced months of manual data labeling using Snorkel Flow

Genentech started by using Snorkel Flow to build an AI application to extract 21 CMS Chronic Condition Entities from internal and external clinical trial protocols. Their application consisted of a named entity recognition NER model, an entity linking model, and a rules-based relationship extractor. Genentech leveraged programmatic labeling and a data-centric AI development approach to yield accurate inclusion-exclusion criteria that clinical scientists and study design teams used for analysis and data-informed protocol design.

Inclusion/exclusion (I/E) criteria AI application pipeline

The data science team at Genentech built their end-to-end application pipeline in a few weeks achieving 98% accuracy with the help of Snorkel’s guided error analysis.

Snorkel Flow made the [clinical trial analytics] pipeline development adrag and drop experience. Michael DAndrea, Principal Data Scientist, Genentech

Genentech also used Snorkel Flow to build an AI application that estimated participant burden from CTPs. Their AI application identified and extracted procedure names from Schedule of Assessment tables and classified them into one of 8 categories. Since their data was labeled programmatically by Snorkel Flow, they were able to quickly adapt to changes in their label schema. The output data was used to harmonize terminology for clinical trial protocols across the organization.

Schedule of Assessments (SoA) AI application pipeline


By deploying AI applications built with Snorkel Flow Genentech estimates that they can increase recruitment for diverse populations and reduce clinical trial times costs and patient dropout rates. The combination of these outcomes will help Genentech dramatically reduce drug development costs and increase the number of drugs in their development pipeline leading to more cures.

Results with Snorkel Flow

This work was presented at the Future of Data-centric AI event hosted by Snorkel AI. Dive deeper into how Genentech used Snorkel Flow to build clinical trial analysis pipelines in this article.

Share this article

Recommended articles

View all articles
alex-meta-scale-thumbnail
Agentic AI evaluation: Closing the gap with better benchmarks and data
Alex Ratner, co-founder and CEO of Snorkel AI, spoke at @Scale: Systems & Reliability about one of the most underappreciated problems in AI deployment: our ability to measure agents has been outpaced — arguably for the first time in the history of the field — by our ability to build them. The talk digs into what it actually takes to close that
June 23, 2026
Snorkel Team
judgment-bench
JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment
At our latest Snorkel AI Reading Group, Russell Yang (AI Engineering Fellow at Stanford Law) stopped by our San Francisco office to present JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment. As AI models improve at open-ended tasks, the field faces a harder problem: how to measure quality in domains where ground truth is contested. Two paradigms dominate: rubric-based
June 18, 2026
Snorkel Team
benchmarks-3-axis
The Art and Science of Building AI Benchmarks That Shape the Field
Vincent Sunn Chen spoke at AI Engineer London about what it actually takes to build AI benchmarks that move the field forward, not just measure it. The throughline is an asymmetry that keeps showing up across deployments and the 150+ proposals reviewed for the Open Benchmarks Grants: agent capabilities are climbing fast, but the ability to measure those agents with
June 16, 2026
Snorkel Team
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.
By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.