SNORKEL DATA SERIES //

workplace agents //

GDPVal+

Train & evaluate frontier agents on the professional work the economy runs on

GDPval+ is Snorkel’s data series for training and evaluating whether AI can do a broad set of professional jobs across domains, roles, and industries.

Developed by Snorkel's AI Data Research Lab, GDPval+ delivers longer-horizon tasks that produces tangible deliverables like a document, spreadsheet, or presentation, drawn from real workflows. With domain expert-curated tasks across all 20 O*NET sectors and 100+ occupations, you can cover up to 100% of the U.S. labor market.

REQUEST DATA SAMPLES //

By submitting this form, I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.

Sector coverage includes

Manufacturing: Plant operations, supply chain, and engineering deliverables

Professional, scientific & technical services: Legal, market research, microbiology, and information security work products

Health care & social assistance: Clinical formulations, authorization packages, and care-team workflows

Educational services: Curriculum design, assessment, and instructional materials

Construction: Project planning, site inspection, and compliance deliverables

Other services: Repair, personal services, and civic-organization workflows

Public administration: Government and emergency-management deliverables

Retail trade: Retail operations, merchandising, and customer-service workflows

Transportation & warehousing: Logistics planning, dispatch, and warehouse-operations tasks

Arts, entertainment & recreation: Creative production and venue-operations workflows

Plus 10 more sectors covering the rest of the U.S. digital labor market.

GDPval+ is intentionally calibrated to stress-test state-of-the-art agents

Built for frontier model evaluation and training:

Empirical difficulty tiers measured against current frontier models, not author judgment
A frontier tier where today's leading models score below 20%
Every task graded against a weighted, expert-authored rubric

If your agent succeeds here, it can do the work of an industry professional.

Why the Snorkel Data Series

High-volume quarterly drops

Multi-layer quality pipeline

Unified execution environment

Direct roadmap influence

Expert-led validation

Every task is built and validated through a multi-layer quality pipeline.

Expert review

Expert contributors author every task; subject-matter experts review each one against acceptance criteria and metadata accuracy.

Programmatic checks

Automated validation ensures task uniqueness, minimum resource requirements, and rubric quality.

Difficulty validation

Task difficulty labels are validated against observed accuracy from a panel of frontier models.

Distribution guardrails

New submissions are accepted only if they maintain dataset balance across task types, difficulty levels, and categories.

Train agents to do the work of industry professionals with the Snorkel Data Series

Talk to a researcher