Forager: Rapid Data Exploration for Rapid Model Development, Machine Learning Whiteboard (MLW) Open-source Series

Forager: Rapid Data Exploration for Rapid Model Development

Machine Learning Whiteboard (MLW) Open-source Series

We started our machine learning whiteboard (MLW) series earlier this year as an open-invite space to brainstorm ideas and discuss the latest papers, techniques, and workflows in the AI space. We emphasize an informal and open environment to everyone interested in learning about machine learning.

In this episode, Fait Poms, a Ph.D. student at Stanford working research and has worked with us on our data exploration interfaces, focuses on the question of “how quickly an ML practitioner, graduate student, or ML engineer can train a reasonable machine learning model for a new computer vision task?” while also diving into three exciting papers:

This episode is part of the #MLwhiteboard video series hosted by Snorkel AI. Check out the episode here:

Paper Abstracts:

Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories

For machine learning models trained with limited labeled training data, validation stands to become the main bottleneck to reducing overall annotation costs. We propose a statistical validation algorithm that accurately estimates the F-score of binary classifiers for rare categories, where finding relevant examples to evaluate is particularly challenging. 

Our key insight is that simultaneous calibration and importance sampling enable accurate estimates even in the low-sample regime (< 300 samples). Critically, we also derive an accurate single-trial estimator of the variance of our method and demonstrate that this estimator is empirically accurate at low sample counts, enabling a practitioner to know how well they can trust a given low-sample estimate. When validating state-of-the-art semi-supervised models on ImageNet and iNaturalist2017, our method achieves the same estimates of model performance with up to 10x fewer labels than competing approaches. In particular, we can estimate model F1 scores with a variance of 0.005 using as few as 100 labels.

Mandoline: Model Evaluation under Distribution Shift

Many real-world ML deployments face the challenge of training a rare category model with a small labeling budget. In these settings, there is often access to large amounts of unlabeled data. Therefore it is attractive to consider semi-supervised or active learning approaches to reduce human labeling effort. However, prior approaches make two assumptions that do not often hold in practice; (a) one has access to a modest amount of labeled data to bootstrap learning, and (b) every image belongs to a common category of interest. In this paper, we consider the scenario where we start with as-little-as five labeled positives of a rare category and a large amount of unlabeled data, of which 99.9% of it is negatives. We propose an active semi-supervised method for building accurate models in this challenging setting. Our method leverages two key ideas: (a) Utilize human and machine effort where they are most effective; human labels are used to identify “needle-in-a-haystack” positives, while machine-generated pseudo-labels are used to identify negatives. (b) Adapt recently proposed representation learning techniques for handling extremely imbalanced human-labeled data to train models with noisy machine labeled data iteratively. We compare our approach with prior active learning and semi-supervised approaches, demonstrating significant improvements in accuracy per unit labeling effort, particularly on a tight labeling budget

Background Splitting: Finding Rare Classes in a Sea of Background

We focus on the real-world problem of training accurate deep models for image classification of a small number of rare categories. In these scenarios, almost all images belong to the background category in the dataset (>95% of the dataset is background). We demonstrate that both standard fine-tuning approaches and state-of-the-art approaches for training on imbalanced datasets do not produce accurate deep models in the presence of this extreme imbalance. Our key observation is that we can drastically reduce the extreme imbalance due to the background category by leveraging visual knowledge from an existing pre-trained model. Specifically, the background category is “split” into smaller and more coherent pseudo-categories during training using a pre-trained model. 

We incorporate background splitting into an image classification model by adding an auxiliary loss that learns to mimic the predictions of the existing, pre-trained image classification model. Note that this process is automatic and requires no additional manual labels. The auxiliary loss regularizes the feature representation of the shared network trunk by requiring it to discriminate between previously homogeneous background instances and reduces overfitting to the small number of rare category positives. We also show that BG splitting can be combined with other background imbalance methods to improve performance further. We evaluate our method on a modified version of the iNaturalist dataset where only a small subset of rare category labels are available during training (all other images are labeled as background). By jointly learning to recognize ImageNet categories and selected iNaturalist categories, our approach yields performance that is 42.3 mAP points higher than a fine-tuning baseline when 99.98% of the data is background, and 8.3 mAP points higher than SotA baselines when 98.30% of the data is background.

Where to connect with Fait: Twitter


If you are interested in learning with us, consider joining us at our biweekly ML whiteboard.

Stay in touch with Snorkel AI, follow us on Twitter, LinkedIn, Facebook, Youtube, or Instagram, and if you’re interested in joining the Snorkel team, we’re hiring! Please apply on our careers page.


Accelerate your AI application development today

Technology developed and deployed with the world’s leading organizations

Related articles

Artificial Intelligence (AI) Facts and Myths
Read more
PonderNet: Learning to Ponder by DeepMind
Read more
Design Principles for Iteratively Building AI Applications
Read more