The Power of
Programmatic Labeling
Snorkel AI's technology is based on years of research represented in 40+ publications around programmatic labeling, weak supervision, and broader ML techniques.
Over 40+ Peer-Reviewed Publications
Conventional Approaches —
The Problem with Legacy AI
Black box models or APIs
Black box models or APIs ignore the nuances of your data and objectives, and offer no way to customize, adapt, or audit their behavior.
Rules-based approaches often don’t generalize as well as ML models on complex, unstructured data or adapt easily to data drift or changing objectives.
Hand-labeled ML
Hand-labeled ML is notoriously expensive and slow, especially when subject matter experts are required, with limited ability to iterate, adapt, audit, or be privacy compliant.
New Approach to AI —
Programmatic Labeling
Snorkel introduces a radically new approach that enables users to programmatically label massive amounts of training data by writing “labeling functions”. While this has led to advancing the state of AI, like any new paradigm it has introduced new challenges, which Team Snorkel has spent over half a decade researching. The result of this work is the Snorkel Flow platform.
The Snorkel Framework —
Weak Supervision
Snorkel’s framework is based on weak supervision, a classical but newly-resurgent set of techniques proven in research as well as hundreds of production deployments. The key idea in weak supervision is to train machine learning models using more efficient but potentially less accurate or “noisier” labels instead of “ground truth” labels provided by groups of expert annotators. Such noisy or so-called weak labels are easier to acquire in massive quantities, often resulting in higher quality models overall.
Snorkel Flow extends and subsumes years of weak supervision research with the concept of a “labeling function”. In Snorkel Flow, users write labeling functions which capture heuristics from domain knowledge of the data, and can leverage existing resources such as input from models, expert systems, and knowledge bases.
Weak labels typically overlap and conflict, vary in accuracy and dataset coverage, and may even hide latent dependencies and correlations. Snorkel automatically models and combines their outputs using a generative model that looks for patterns of agreements and disagreements, then uses the resulting probabilistic labels to train a discriminative model. Team Snorkel has spent years on theoretically- and empirically-grounded research advances that go into the foundations of the Snorkel Flow, and continue to integrate the latest advances in state-of-the-art.
Community —