Snorkel AI is a Gartner Cool Vendor for Data-Centric AI.
PromptSource is a system that provides a templating language, an interface, and a set of guidelines to create, share, and use natural language prompts to train and query language models.
In this paper, accepted at ICLR 2022, Chris and team at Stanford outline a new principled evaluation framework for comparing slice detection methods, then introduce a new technique motivated by our discoveries that outperforms existing methods by double digits.
This paper describes TAGLETS, a system built to study techniques for automatically exploiting all three types of data and creating high-quality, servable classifiers
Presenting Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules.
This paper proposes a universal technique that enables weak supervision over any label type while still offering desirable properties, including practical flexibility, computational efficiency, and theoretical guarantees.
This paper showcases how using a data-centric approach to generate high-quality training data at massive scale to improve the zero-shot abilities of that model.
This paper extends the scope of usable sources in WS, by formulating Weak Indirect Supervision (WIS), a new research problem for automatically synthesizing training labels based on indirect supervision sources that have different output label spaces.
This paper introduces the Structured State Space sequence model (s4), which uses a new parameterization for the state-space model to improve long-range dependency handling both mathematically and empirically.
See Snorkel Flow’s data-centric AI workflow in action
Join the Snorkel AI newsletterLearn what’s new in Snorkel Flow and AI