Image
author

Paroma Varma

Co-Founder and Head of Research
,
Snorkel AI

Paroma Varma is the co-founder and Head of Research at Snorkel AI, and earned her doctorate in electrical engineering from Stanford University. Her research focused on democratizing machine learning for domain experts who lack access to large datasets necessary for training intricate models, thus making complex AI technologies more accessible and impactful for a broader audience. She applied these methods in diverse fields such as medical imaging and autonomous driving.

The latest from Paroma

Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes
Fine-tuning Large Language Models (LLMs) typically relies on large quantities of high-quality annotated data, or questions with well-defined ground truth answers in the case of Reinforcement Learning with Verifiable Rewards (RLVR). While previous work has explored the benefits to model reasoning capabilities by scaling both data and compute used for RLVR, these results lack applicability in many real-world settings where annotated data and accessible compute may be scarce. In this work, we present a comprehensive empirical study of open-source Small Language Model (SLM) performance after RLVR in low data regimes. Across three novel datasets covering number counting problems, graph reasoning,...
Research Paper
Accepted to MLSys 2026
Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes

Fine-tuning Large Language Models (LLMs) typically relies on large quantities of high-quality annotated data, or questions with well-defined ground truth answers in the case of Reinforcement Learning with Verifiable Rewards (RLVR). While previous work has explored the benefits to model reasoning capabilities by scaling both data and compute used for RLVR, these results lack applicability in many real-world settings where…

Learn more about Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes
RIFT: A Rubric Failure Mode Taxonomy and Automated Diagnostics
Rubric-based evaluation is widely used in LLM benchmarks and training pipelines for open-ended, less verifiable tasks. While prior work has demonstrated the effectiveness of rubrics using downstream signals such as reinforcement learning outcomes, there remains no principled way to diagnose rubric quality issues from such aggregated or downstream signals alone. To address this gap, we introduce RIFT: RubrIc Failure mode Taxonomy, a taxonomy for systematically characterizing failure modes in rubric composition and design. RIFT consists of eight failure modes organized into three high-level categories: Reliability Failures, Content Validity Failures, and Consequential Validity Failures. RIFT is developed using grounded theory by...
Research Paper
Accepted to ICLR Brazil 2026
RIFT: A Rubric Failure Mode Taxonomy and Automated Diagnostics

Rubric-based evaluation is widely used in LLM benchmarks and training pipelines for open-ended, less verifiable tasks. While prior work has demonstrated the effectiveness of rubrics using downstream signals such as reinforcement learning outcomes, there remains no principled way to diagnose rubric quality issues from such aggregated or downstream signals alone. To address this gap, we introduce RIFT: RubrIc Failure mode…

Learn more about RIFT: A Rubric Failure Mode Taxonomy and Automated Diagnostics
Automating benchmark design
The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing model capabilities, but these quickly become saturated. In contrast, dynamic benchmarks evolve alongside the models they evaluate, but are expensive to create and continuously update. To address these challenges, we develop BeTaL (Benchmark Tuning with an LLM-in-the-loop), a framework that leverages environment design principles to automate the process of dynamic benchmark design. BeTaL works by parameterizing key design choices in base benchmark templates and uses LLMs to reason through the resulting parameter space...
Research Paper
Automating benchmark design

The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing model capabilities, but these quickly become saturated. In contrast, dynamic benchmarks evolve alongside the models they evaluate, but are expensive to create and continuously update. To address these challenges, we develop BeTaL (Benchmark…

Learn more about Automating benchmark design
Walking safely before building flying saucer seatbelts: introducing Enterprise Alignment
Blog
Walking safely before building flying saucer seatbelts: introducing Enterprise Alignment

Snorkel takes a step on the path to enterprise superalignment with new data development workflows for enterprise alignment

Learn more about Walking safely before building flying saucer seatbelts: introducing Enterprise Alignment
Here’s how Snorkel Flow + Google AI built an enterprise-ready model in a day
Blog
Here’s how Snorkel Flow + Google AI built an enterprise-ready model in a day

Google and Snorkel AI customized PaLM 2 using domain expertise and data development to improve performance by 38 F1 points in a matter of hours.

Mar 19, 2024
Learn more about Here’s how Snorkel Flow + Google AI built an enterprise-ready model in a day
DEEM’22: Data Management for End-to-End Machine Learning
The DEEM’22 workshop (Data Management for End-to-End Machine Learning) is held on Sunday June 12th, in conjunction with SIGMOD/PODS 2022. DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arisingdata management issues in ML application scenarios. The workshop solicits regular research papers (10 pages) describing preliminary and ongoing research results, including industrial experience reports of end-to-end ML deployments, related to DEEM topics. In addition, DEEM 2022 establishes a new paper category for reports on applications and tools (4 pages) as a forum for sharing interesting...
Research Paper
DEEM’22: Data Management for End-to-End Machine Learning

The DEEM’22 workshop (Data Management for End-to-End Machine Learning) is held on Sunday June 12th, in conjunction with SIGMOD/PODS 2022. DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arisingdata management issues in ML application scenarios. The workshop solicits regular research papers (10 pages) describing…

Oct 20, 2023
M. Boehm, et al.
Learn more about DEEM’22: Data Management for End-to-End Machine Learning
Parameterizing neural power spectra into periodic and aperiodic components
Electrophysiological signals exhibit both periodic and aperiodic properties. Periodic oscillations have been linked to numerous physiological, cognitive, behavioral and disease states. Emerging evidence demonstrates that the aperiodic component has putative physiological interpretations and that it dynamically changes with age, task demands and cognitive states. Electrophysiological neural activity is typically analyzed using canonically defined frequency bands, without consideration of the aperiodic (1/f-like) component. We show that standard analytic approaches can conflate periodic parameters (center frequency, power, bandwidth) with aperiodic ones (offset, exponent), compromising physiological interpretations. To overcome these limitations, we introduce an algorithm to parameterize neural power spectra as a combination...
Research Paper
Parameterizing neural power spectra into periodic and aperiodic components

Electrophysiological signals exhibit both periodic and aperiodic properties. Periodic oscillations have been linked to numerous physiological, cognitive, behavioral and disease states. Emerging evidence demonstrates that the aperiodic component has putative physiological interpretations and that it dynamically changes with age, task demands and cognitive states. Electrophysiological neural activity is typically analyzed using canonically defined frequency bands, without consideration of the aperiodic…

Nov 23, 2020
T. Donoghue, et al.
Learn more about Parameterizing neural power spectra into periodic and aperiodic components
Cardiac Imaging of Aortic Valve Area From 34 287 UK Biobank Participants Reveals Novel Genetic Associations and Shared Genetic Comorbidity With Multiple Disease Phenotypes
Background: The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. Methods: From a sample of 34 287 white British ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac magnetic resonance imaging sequences of the aortic valve. Aortic valve area measurements were submitted to genome-wide association testing, followed by polygenic risk scoring and phenome-wide screening, to identify genetic comorbidities. Results: A genome-wide association study of aortic valve area in these UK Biobank participants showed 3 significant associations, indexed by rs71190365 (chr13:50764607, DLEU1, P=1.8×10−9), rs35991305 (chr12:94191968, CRADD, P=3.4×10−8), and chr17:45013271:C:T...
Research Paper
Cardiac Imaging of Aortic Valve Area From 34 287 UK Biobank Participants Reveals Novel Genetic Associations and Shared Genetic Comorbidity With Multiple Disease Phenotypes

Background: The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. Methods: From a sample of 34 287 white British ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac magnetic resonance imaging sequences of the aortic valve. Aortic valve area measurements were submitted to genome-wide association testing, followed by…

Oct 30, 2020
A. Córdova-Palomera, et al.
Learn more about Cardiac Imaging of Aortic Valve Area From 34 287 UK Biobank Participants Reveals Novel Genetic Associations and Shared Genetic Comorbidity With Multiple Disease Phenotypes
Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data
While the detection and classification of simple objects encountered during autonomous driving sessions has been widely researched, the detection of complex objects and situations based on the combinations of objects in a scene remains relatively overlooked. This is especially difficult due to the cost of gathering labels for each complex scenario of interest before training a specialized model. To address this bottleneck of training data, we explore the applicability of weak supervision, or relying on higher level, noisier forms of supervision to label training data. Specifically, we use data programming, a paradigm that can learn the accuracy and dependency structure...
Research Paper
Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data

While the detection and classification of simple objects encountered during autonomous driving sessions has been widely researched, the detection of complex objects and situations based on the combinations of objects in a scene remains relatively overlooked. This is especially difficult due to the cost of gathering labels for each complex scenario of interest before training a specialized model. To address…

Dec 19, 2019
Z. Wheng, et al, 2019
Learn more about Utilizing Weak Supervision to Infer Complex Objects in Autonomous Driving Data
1 2

For models that need to be right. Not just good enough.