Generative AI is at peak hype and poised to dive into the “trough of despair,” according to the 2023 Gartner® Hype Cycle™ for Artificial Intelligence, while data labeling and annotation services are entering the “plateau of productivity.”
The level of excitement Gartner® identified for generative AI mirrors our own findings. A poll we conducted at our recent The Future of Data-Centric AI conference found that most attendees hoped to launch a production use of large language models by the end of 2023.
But building a custom generative AI model—while achievable—is more work than business leaders might expect. It requires an amount of data labeling and curation that exceeds most companies’ internal capacities.
“Models, especially for generative AI, increasingly come from vendors rather than being delivered in-house. Data is becoming the main means for enterprises to get value from these pre-trained models.”
– Gartner® Hype Cycle™ for Artificial Intelligence, 2023
Learn how Snorkel AI can help you identify and tackle your business problems through data-centric AI. We are recognized in this report as a Sample Vendor for data-centric AI as well as data labeling and annotation.
Get your copy of the Gartner® Hype Cycle™ for Artificial Intelligence 2023 report today.
GenAI’s promise and struggle
Generative AI has been turning heads since the debut of GPT-2 in 2019, but the technology splashed into the mainstream with the debut of ChatGPT in November 2022. Suddenly, non-technical users witnessed the LLM-backed chatbot’s ability to regurgitate knowledge, explain jokes and write poems. In the following months, GenAI announcements came quickly.
Gartner® noted that this elevated discussions about GenAI into the board room and changed how they looked at the world.
“Generative AI has had an impact like no other technology in the past decade,” the report said.
As that excitement builds, Gartner® noted, a gap remains between the technology’s expected potential and its actual usage. Business leaders are beginning to learn that creating and using GenAI is not as easy as they would like it to be, and doing so successfully calls for a suite of other technologies and applications—including data-centric AI and data labeling.
The necessity of data labeling in the age of GenAI
Building a better GenAI starts with better data.
“When models are pretrained, data is the main means for customization and fine-tuning of the models,” Gartner® said.
Snorkel researchers recently demonstrated the power of data quality in collaboration with researchers at Together AI. The team used Snorkel Flow to label the original training data for the RedPajama large language model according to task type and quality. Then, they used those labels to curate the original training set for task balance and quality.
Once they fine-tuned the model with the curated data set, human testers preferred responses from the newly-tuned model more often than those from its parent version in every major category.
At present, Gartner® estimates that 20% or less of its target audience has started using data-labeling and annotation services—services that Gartner® rated as high benefit.
The value of data labeling and annotation extends well beyond generative AI. Gartner® noted that these services remove a bottleneck in developing usable, high-value AI solutions. While human labeling services can fill a gap, they can be cumbersome and raise security concerns. Companies may also struggle to find crowd workers with the appropriate expertise to hand-label complex documents.
Companies like Snorkel—a Gartner® Sample Vendor in this report—allow enterprises to scale the value of their internal experts. For example, the Snorkel team working on the RedPajama project used two developers over the course of one day to label what could have taken hundreds of annotators weeks or months.
Early data-centric AI adopters to reap rewards
Gartner® also recognized Snorkel AI as a sample vendor for data-centric AI.
Data-centric AI is an approach that focuses on improving the quality of training data to build better AI systems. Our co-founder, Alex Ratner, has called this “data as a programming language.”
The idea is simple: if you can feed a sufficiently robust model a sufficient quantity of high-quality examples of inputs and desired outputs, the model will learn how to get from point A to point B.
The data-centric philosophy goes well beyond the point of training a model. At our Future of Data-Centric AI Virtual Conference, Nurtekin Savas, head of enterprise data science at Capital One, said he takes the idea of data-centric AI from the point of data creation to the point of data deletion. With this approach, every step along the data journey—including data storage and data registration—keeps in mind the idea that this data could eventually be used in a model.
Savas, according to Gartner®, is a trailblazer. The company rated the data-centric AI field as “embryonic,” with 5% or less of the target audience actively engaging with the discipline.
What are Gartner’s Hype Cycle™ Reports?
Gartner® Hype Cycles™ provide a graphic representation of the maturity and adoption of technologies and applications, and how they are potentially relevant to solving real business problems and exploiting new opportunities. Gartner Hype Cycle methodology gives you a view of how a technology or application will evolve over time, providing a sound source of insight to manage its deployment within the context of your specific business goals.
Each Hype Cycle™ drills down into the five key phases of a technology’s life cycle.
- Innovation Trigger.
- Peak of Inflated Expectations.
- Trough of Disillusionment.
- Slope of Enlightenment.
- Plateau of Productivity.
Each Hype Cycle™ report focuses on a specific area of technological capabilities, such as data management, devops, or digital marketing, and asses the importance and business impact of individual technologies in that area.
Conclusion
The Gartner® findings mirror our experiences working with data science teams at many of the world’s largest organizations:
- Generative AI is—and will continue to be—very important.
- Data is the best way to program models.
- Data quality matters.
Business leaders are beginning to see that creating and using GenAI is not as easy as they would like. Doing so successfully calls for a suite of other technologies and applications—including data-centric AI and data labeling.
Snorkel’s own recent explorations—from our poll of enterprise AI leaders to our experiment using programmatic labeling to improve the performance of the RedPajama LLM—have reinforced these findings.
Data-centric AI is in a nascent state, but it will be essential for companies who wish to outperform their competitors in the future.
Learn more by reading the complete Hype Cycle™ report. Get your copy here.
Hype Cycle™ for Artificial Intelligence, 2023
GARTNER® is a registered trademark and service mark of Gartner® and Hype Cycle™ are a registered trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved. Gartner® does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner® research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner® disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.