We ran Opus 4.8 and GPT-5.5 against our GDPVal+ dataset.

GDPval+ is a progressive curriculum of economically valuable workplace tasks, expert-authored across all 20 O*NET sectors and 100+ occupations.

Each datapoint pairs a professional prompt, reference files, a golden solution, and a weighted rubric, packaged as both a raw definition and a Harbor-ready evaluation. Our dataset delivers thousands of single-turn tasks averaging 5-10 hours of human completion time.

Opus 4.8

21%

Pass@1

GPT-5.5

22%

Pass@1

REQUEST DATA SAMPLES //

By submitting this form, I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.

Train and evaluate agents on real, economically valuable work across dozens of professional occupations with GDPval+.

Request samples

We ran Opus 4.8 and GPT-5.5 against our GDPVal+ dataset.

Train and evaluate agents on real, economically valuable work across dozens of professional occupations with GDPval+.

How do you want to work with Snorkel?