Snorkel AI is a Gartner Cool Vendor for Data-Centric AI.
This paper proposes source-aware variation of Influence Function, which measures the influence of individual components in the Programmatic Weak Supervision pipeline, and can be used for multiple purposes such as understanding incorrect predictions, identifying mislabeling of sources, and improving the end model’s generalization performance.
BigBIO is a community library of biomedical NLP datasets that facilitates meta-dataset curation and enables zero-shot evaluation of biomedical prompts and multi-task learning.
This work proposes and theoretically justifies a model that fuses weak supervision and generative adversarial networks to improve the estimate of unobserved labels and data augmentation, outperforming baseline weak supervision models on multiclass image classification datasets.
Compositional soft prompting is a parameter-efficient technique that improves the zero-shot compositionality of large-scale pretrained VLMs by learnable tokens of vocabulary and outperforms existing methods on benchmark datasets.
Liger, a combination of foundation models and weak supervision frameworks, improves existing weak supervision techniques by partitioning the embedding space and extending source votes in embedding space, resulting in improved performance on six benchmark NLP and video tasks.
This paper presents Nemo, an interactive system that improves the overall productivity of Weak Supervision learning pipelines by an average of 20%, compared to the prevailing WS approach.
This paper presents a comprehensive survey of recent advances in Programmatic Weak Supervision (PWS), and discusses related approaches to tackle limited labeled data scenarios.
This paper finds that only 13% of biomedical datasets are available via programmatic access and 30% lack documentation on licensing and permitted reuse, highlighting the dataset debt in biomedical NLP.
See Snorkel Flow’s data-centric AI workflow in action
Join the Snorkel AI newsletterLearn what’s new in Snorkel Flow and AI