This work demonstrates how organizational resources, in the form of aggregate statistics, knowledge bases, and existing services can be used to connect new and existing data modalities.
The Role of Massively Multi-Task and Weak Supervision in Software 2.0
Outlining a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems.
Scene Graph Prediction With Limited Labels
This paper introduces a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few labeled examples.
A Machine-Compiled Database of Genome-Wide Association Studies
Describing GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms.
Training Classifiers with Natural Language Explanations
Introducing BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision.
Software 2.0 and Snorkel: Beyond Hand-Labeled Data
This paper describes Snorkel, a system that enables users to help shape, create, and manage training data for Software 2.0 stacks.
Fonduer: Knowledge Base Construction From Richly Formatted Data
Introducing Fonduer, a machine-learning-based KBC system for richly formatted data.
Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data
Introducing Socratic learning, a paradigm that uses feedback from a discriminative model to automatically identify latent data subsets in training data.
Data Programming With DDLite: Putting Humans in a Different Part of the Loop
Introducing DDLite, an interactive development framework for data programming.