Applied AI
Customers

How Schlumberger uses Snorkel Flow to enhance proactive well management

September 30, 2022
4 min read

Schlumberger is the world’s leading provider of technology and services for the energy industry, operating in over 120 countries. The company provides well maintenance and analytics services to the world’s biggest oil companies, and it believes that large-scale data analysis and artificial intelligence/machine learning will help them remain a leader in the market. One way they’ve been able to achieve this is by building their own AI application using Snorkel Flow to automatically extract geological entities and critical field data across a variety of document structures and report types they receive from their customers.

Providing proactive well maintenance with automated information extraction
Schlumberger (SLB: NYSE) is a technology company that partners with customers to access energy. The Software Technology Innovation Center (STIC), within the 85,000-person industry leader, is dedicated to using new AI/ML applications to support the company’s mission to improve the performance and sustainability for the global energy industry. One way is to streamline information extraction from critical field data that underpins Schlumberger’s efforts to do a large-scale analysis of business operations and deliver data-driven insights into their performance.

Challenge

The energy industry generates tons of daily reports ranging from daily drilling reports to well maintenance logs. Each document has its structure and format, which makes it difficult for Schlumberger’s team to extract crucial information quickly. Automating the information extraction of the text within these unstructured PDFs using Named Entity Recognition (NER) would greatly accelerate the team towards their goal of delivering highly-accurate large-scale analysis.

Example of an NLP pipeline using unstructured data at Schlumberger

The team explored typical off-the-shelf ML models but wasn’t able to identify the scientific terms related to the Exploration and Production (E&P) industry. They also tried to create a domain-specific training dataset using various labeling tools and borrowing from precious subject matter expert (SME) time, but that took anywhere from 1-3 hrs per document, which wasn’t scalable. Ultimately, the team needed to identify 18 different industry-specific entities and automatically identify and associated data with these entities, but a few things stood in the way:

  • Rich information was buried within tabular and raw text in PDFs with varied formatting across reports from different companies.
  • Poor collaboration between domain experts and data scientists with cumbersome file sharing and ad-hoc meetings.
  • Time to label training data manually was a bottleneck to building AI to automate this effort.

“What would have taken us months to go through an iteration can happen in minutes now. Literally!”

Swaroop Kalasapur
Head Of Schlumberger Technology Innovation Center, Schlumberger

Goal

Minimize the time subject matter experts (SMEs) spend labeling training data while ensuring that the system can adapt to new or changing document formats.

Solution

In just three days, Schlumberger was able to build an AI application using Snorkel Flow to automatically extract key scientific data from geological and field data reports and use it to guide recommendations for better well management across multiple clients. By using a data-centric artificial intelligence (AI) development lifecycle accelerated by programmatic labeling, Monisha Manoharan, a Machine Learning Engineer at Schlumberger, and her team built a classification task that reached an 85% F1 score in those initial three days with the Snorkel Flow team.

After a few rounds of rapid iteration using Snorkel Flow’s model-guided error analysis and programmatic labeling, they improved their F1 score to 91.4%. Which was “impressive compared to what we had achieved previously,” Monisha said.

The AI application Monisha and her team built with Snorkel Flow reduced the processing time of reports from 1 to 3 hours per report to just a few seconds. Using their new AI app, they extracted several different entities from unstructured data, including well maintenance activity description (textual), time of activity (numerical), and more. They also overcame the challenge of non-standard reporting formats, successfully identifying entities across 15 different document structures.

  • ML solution generalized to a variety of document structures, including unseen PDF and tabular formats.
  • Improved collaboration between domain experts and data scientists across labeling, troubleshooting, and iteration.
  • Auto-labeled by capturing labeling expertise as labeling functions and applying intelligently en-masse.

Not only did Monisha and the STIC team successfully develop an AI-enhanced tool to help Schlumberger extract key field/scientific data automatically, they’ve also established a repeatable data-centric AI development lifecycle as a foundation for the future of data science development at Schlumberger.

“We created a binary classification task and we were able to reach an 85 F1 score in under three days… later improving that score to 91.4 which is highly impressive compared to what we had before.”

Monisha Manoharan
Senior Machine Learning Engineer Schlumberger

<3 Days

to build a highly-performant ML application

47%

improved generalization over previous rules-only approach

This work was presented at the Future of Data-centric AI event hosted by Snorkel AI. Watch this and many other sessions on-demand at future.snorkel.ai.

Share this article
Nick Harvey author profile
Nick Harvey
Director of Product Marketing

Recommended articles

View all articles
agents-last-exam-thumbnail
Agents’ Last Exam: AI Benchmarking for Real Work
At our latest Snorkel AI Reading Group, Yiyou Sun and David (Xinyang) Han (UC Berkeley, Center for Responsible and Decentralized Intelligence) presented Agents’ Last Exam (ALE) — a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes. ALE is a collaboration between Berkeley RDI, Snorkel AI, and 300+ expert contributors across 55 professional subfields. ALE asks a deceptively simple question: can
June 30, 2026
Snorkel Team
continual-learning-bench-featured-image
Continual learning and evaluating how AI agents learn across sequences of tasks
Most agent benchmarks evaluate each task as an independent episode. The agent receives a task, produces an answer, gets scored, and moves on. The next task starts as if the previous one never happened. That setup misses a core requirement for deployed agents. A coding agent, research assistant, data analyst, or workplace assistant should improve as it works across repeated
June 29, 2026
Chris Glaze
Image
Benchtalks #3: We taught AI everything except how to learn
For our third Benchtalks, the series dedicated to the researchers building the measurement toolkits that frontier labs hill-climb on, Snorkel AI co-founder Vincent Sunn Chen sat down with Parth Asawa, a PhD student at UC Berkeley advised by Matei Zaharia and Joey Gonzalez. Parth leads research on continual learning and is the creator of Continual Learning Bench, developed in collaboration
June 25, 2026
Vincent Sunn Chen
Image

Join our newsletter

For expert advice, the latest research, and exclusive events.
By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.