blur-bg-frame-lightblur-bg-frame-dark
dark-curve-bglight-curve-bg
SNORKEL DATA SERIES //

workplace agents //

GDPVal+

Train & evaluate frontier agents on the professional work the economy runs on

GDPval+ is Snorkel’s data series for training and evaluating whether AI can do a broad set of professional jobs across domains, roles, and industries.

Developed by Snorkel's AI Data Research Lab, GDPval+ delivers longer-horizon tasks that produces tangible deliverables like a document, spreadsheet, or presentation, drawn from real workflows. With domain expert-curated tasks across all 20 O*NET sectors and 100+ occupations, you can cover up to 100% of the U.S. labor market.

REQUEST DATA SAMPLES //
By submitting this form, I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.

Sector coverage includes

  • Manufacturing: Plant operations, supply chain, and engineering deliverables
  • Professional, scientific & technical services: Legal, market research, microbiology, and information security work products

  • Health care & social assistance: Clinical formulations, authorization packages, and care-team workflows

  • Educational services: Curriculum design, assessment, and instructional materials

  • Construction: Project planning, site inspection, and compliance deliverables
  • Other services: Repair, personal services, and civic-organization workflows

  • Public administration: Government and emergency-management deliverables

  • Retail trade: Retail operations, merchandising, and customer-service workflows

  • Transportation & warehousing: Logistics planning, dispatch, and warehouse-operations tasks

  • Arts, entertainment & recreation: Creative production and venue-operations workflows

Plus 10 more sectors covering the rest of the U.S. digital labor market.

GDPval+ is intentionally calibrated to stress-test state-of-the-art agents

Built for frontier model evaluation and training:

 

  • Empirical difficulty tiers measured against current frontier models, not author judgment
  • A frontier tier where today's leading models score below 20%
  • Every task graded against a weighted, expert-authored rubric
If your agent succeeds here, it can do the work of an industry professional.

Why the Snorkel Data Series

High volume quarterly drops icon
High-volume quarterly drops
Multi layer quality pipeline icon
Multi-layer quality pipeline
Unified execution environment icon
Unified execution environment
Direct roadmap influence icon
Direct roadmap influence

Expert-led validation

Every task is built and validated through a multi-layer quality pipeline.

01

Expert review

Expert contributors author every task; subject-matter experts review each one against acceptance criteria and metadata accuracy.

02

Programmatic checks

Automated validation ensures task uniqueness, minimum resource requirements, and rubric quality.

03

Difficulty validation

Task difficulty labels are validated against observed accuracy from a panel of frontier models.

04

Distribution guardrails

New submissions are accepted only if they maintain dataset balance across task types, difficulty levels, and categories.

feather graphics blur image
feather graphics normal image

Train agents to do the work of industry professionals with the Snorkel Data Series