Image
author

Fait Poms

Senior Applied Research Scientist
,
Snorkel AI

The latest from Fait

R2E2: low-latency path tracing of terabyte-scale scenes using thousands of cloud CPUs
In this paper we explore the viability of path tracing massive scenes using a “supercomputer” constructed on-the-fly from thousands of small, serverless cloud computing nodes. We present R2E2 (Really Elastic Ray Engine) a scene decomposition-based parallel renderer that rapidly acquires thousands of cloud CPU cores, loads scene geometry from a pre-built scene BVH into the aggregate memory of these nodes in parallel, and performs full path traced global illumination using an inter-node messaging service designed for communicating ray data. To balance ray tracing work across many nodes, R2E2 adopts a service-oriented design that statically replicates geometry and texture data from...
Research Paper
R2E2: low-latency path tracing of terabyte-scale scenes using thousands of cloud CPUs

In this paper we explore the viability of path tracing massive scenes using a “supercomputer” constructed on-the-fly from thousands of small, serverless cloud computing nodes. We present R2E2 (Really Elastic Ray Engine) a scene decomposition-based parallel renderer that rapidly acquires thousands of cloud CPU cores, loads scene geometry from a pre-built scene BVH into the aggregate memory of these nodes…

Oct 20, 2023
S Fouladi, et al.
Learn more about R2E2: low-latency path tracing of terabyte-scale scenes using thousands of cloud CPUs
Beyond prompting: getting production quality LLM performance with Snorkel Flow
Blog
Beyond prompting: getting production quality LLM performance with Snorkel Flow

As enterprises look toward deploying LLM-powered, business-critical applications, they’re learning to use strategies beyond prompting.

Aug 09, 2023
Learn more about Beyond prompting: getting production quality LLM performance with Snorkel Flow
Learning Rare Category Classifiers on a Tight Labeling Budget
Many real-world ML deployments face the challenge of training a rare category model with a small labeling budget. In these settings, there is often access to large amounts of unlabeled data, therefore it is attractive to consider semisupervised or active learning approaches to reduce human labeling effort. However, prior approaches make two assumptions that do not often hold in practice; (a) one has access to a modest amount of labeled data to bootstrap learning and (b) every image belongs to a common category of interest. In this paper, we consider the scenario where we start with as-little-as five labeled positives...
Research Paper
Learning Rare Category Classifiers on a Tight Labeling Budget

Many real-world ML deployments face the challenge of training a rare category model with a small labeling budget. In these settings, there is often access to large amounts of unlabeled data, therefore it is attractive to consider semisupervised or active learning approaches to reduce human labeling effort. However, prior approaches make two assumptions that do not often hold in practice;…

Oct 10, 2021
RT. Mullapudi, et al.
Learn more about Learning Rare Category Classifiers on a Tight Labeling Budget
Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories
For machine learning models trained with limited labeled training data, validation stands to become the main bottleneck to reducing overall annotation costs. We propose a statistical validation algorithm that accurately estimates the F-score of binary classifiers for rare categories, where finding relevant examples to evaluate on is particularly challenging. Our key insight is that simultaneous calibration and importance sampling enables accurate estimates even in the low-sample regime (< 300 samples). Critically, we also derive an accurate single-trial estimator of the variance of our method and demonstrate that this estimator is empirically accurate at low sample counts, enabling a practitioner to...
Research Paper
Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories

For machine learning models trained with limited labeled training data, validation stands to become the main bottleneck to reducing overall annotation costs. We propose a statistical validation algorithm that accurately estimates the F-score of binary classifiers for rare categories, where finding relevant examples to evaluate on is particularly challenging. Our key insight is that simultaneous calibration and importance sampling enables…

Sep 13, 2021
F. Poms, et al.
Learn more about Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories
MANDOLINE: Model Evaluation under Distribution Shift
Machine learning models are often deployed in different settings than they were trained and validated on, posing a challenge to practitioners who wish to predict how well the deployed model will perform on a target distribution. If an unlabeled sample from the target distribution is available, along with a labeled sample from a possibly different source distribution, standard approaches such as importance weighting can be applied to estimate performance on the target. However, importance weighting struggles when the source and target distributions have non-overlapping support or are high-dimensional. Taking inspiration from fields such as epidemiology and polling, we develop MANDOLINE,...
Research Paper
MANDOLINE: Model Evaluation under Distribution Shift

Machine learning models are often deployed in different settings than they were trained and validated on, posing a challenge to practitioners who wish to predict how well the deployed model will perform on a target distribution. If an unlabeled sample from the target distribution is available, along with a labeled sample from a possibly different source distribution, standard approaches such…

Jul 01, 2021
M. Chen, et al.
Learn more about MANDOLINE: Model Evaluation under Distribution Shift
Background Splitting: Finding Rare Classes in a Sea of Background
We focus on the problem of training deep image classification models for a small number of extremely rare categories. In this common, real-world scenario, almost all images belong to the background category in the dataset. We find that state-of-the-art approaches for training on imbalanced datasets do not produce accurate deep models in this regime. Our solution is to split the large, visually diverse background into many smaller, visually similar categories during training. We implement this idea by extending an image classification model with an additional auxiliary loss that learns to mimic the predictions of a pre-existing classification model on the...
Research Paper
Background Splitting: Finding Rare Classes in a Sea of Background

We focus on the problem of training deep image classification models for a small number of extremely rare categories. In this common, real-world scenario, almost all images belong to the background category in the dataset. We find that state-of-the-art approaches for training on imbalanced datasets do not produce accurate deep models in this regime. Our solution is to split the…

Jan 01, 2021
RT. Mullapudi, et al.
Learn more about Background Splitting: Finding Rare Classes in a Sea of Background
Train and You’ll Miss It: Interactive Model Iteration With Weak Supervision…
This paper provides a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard weak supervision.
Research Paper
Train and You’ll Miss It: Interactive Model Iteration With Weak Supervision…

This paper provides a series of results studying how performance scales with changes in source coverage, source accuracy, and the Lipschitzness of label distributions in the embedding space, and compare this rate to standard weak supervision.

Nov 13, 2020
M. Chen, et al, 2020
Learn more about Train and You’ll Miss It: Interactive Model Iteration With Weak Supervision…

For models that need to be right. Not just good enough.