Resource library
Explore our complete library of resources including blogs, benchmarks, research papers, and more.
Introducing Snorkel, a new system for quickly creating, managing, and modeling training datasets.
Automating data augmentation by learning a generative sequence model over user-specified transformation functions.
Proposing a structure estimation method that is 100x faster than a maximum likelihood approach for training data.
Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects the quality of the training labels, but is difficult to learn without any ground truth labels. We instead rely on weak supervision…
Introducing SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly.
A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to label training data. Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to…


This paper presents a flexible interface layer to write labeling functions based on experience.
A paradigm for labeling training datasets programmatically rather than by hand.
Introducing DDLite, an interactive development framework for data programming.












