Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision
Liger, a combination of foundation models and weak supervision frameworks, improves existing weak supervision techniques by partitioning the embedding space and extending source votes in embedding space, resulting in improved performance on six benchmark NLP and video tasks.
Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming
This paper presents Nemo, an interactive system that improves the overall productivity of Weak Supervision learning pipelines by an average of 20%, compared to the prevailing WS approach.
A Survey on Programmatic Weak Supervision
This paper presents a comprehensive survey of recent advances in Programmatic Weak Supervision (PWS), and discusses related approaches to tackle limited labeled data scenarios.
Dataset Debt in Biomedical Language Modeling
This paper finds that only 13% of biomedical datasets are available via programmatic access and 30% lack documentation on licensing and permitted reuse, highlighting the dataset debt in biomedical NLP.
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
PromptSource is a system that provides a templating language, an interface, and a set of guidelines to create, share, and use natural language prompts to train and query language models.
Domino: Discovering Systematic Errors with Cross-Modal Embeddings
In this paper, accepted at ICLR 2022, Chris and team at Stanford outline a new principled evaluation framework for comparing slice detection methods, then introduce a new technique motivated by our discoveries that outperforms existing methods by double digits.
TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data
This paper describes TAGLETS, a system built to study techniques for automatically exploiting all three types of data and creating high-quality, servable classifiers
Ontology-driven weak supervision for clinical entity classiﬁcation in electronic health records
Presenting Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules.