The world of social media moves fast, which poses a challenge for those who need to efficiently and accurately filter social media content. Snorkel AI recently worked with a large social media management that faced just this kind of challenge.

They needed a more effective model for tagging profiles according to whether or not they linked to adult content. Their existing model fell short, and the tools at their disposal proved insufficient.

This was a serious concern. They had a significant partnership that hinged on improving their ability to accurately classify adult profiles.

Enter Snorkel AI. Our mission is to democratize AI by making it easier for enterprises to build and deploy machine learning models. We were ready to help.

I recently talked with Matt Casey, data science content lead at Snorkel AI, about this case. You can watch the full interview (embedded below), but I’ve summed up the main points here.

The content filtering challenge: hitting a ceiling with no way through

The client was in a challenging situation. Their existing model achieved a recall of about 85% in identifying adult-oriented profiles. An impending partnership demanded a model with a recall in the upper nineties.

The platform they used before turning to Snorkel AI presented several roadblocks. First, it was not conducive to quick iterations, a key requirement given the client’s strict timeline and the fast-paced nature of social media content.

Second, their existing tool made the process of labeling new data cumbersome and slow. Compounding this problem, the client had no labeled data to begin with. Even if they did, their existing platform didn’t offer an easy way to incorporate labeled data into the existing model.

In short, they were stuck. They had a deadline looming, a goal to meet, and no clear path to meeting that goal before the clock ran out.

So, they reached out to us.

Snorkel AI’s solution: Snorkel Flow

We introduced the client to Snorkel Flow, our AI data development platform. The platform amplifies the impact of subject matter experts (SMEs) to scale and streamline the data labeling process.

Snorkel Flow’s programmatic labeling process starts with labeling functions—essentially programmable rules to label data. Snorkel Flow users can build labeling functions according to various data features—from continuous variable thresholds to vector embedding clusters. In this case, the client’s labeling functions were primarily substring-based, focusing on identifying specific keywords in the data.

This resulted in an unusually high number of labeling functions. By the end of the project, the client’s users had created 160 separate labeling functions. Some Snorkel Flow projects can use as few as ten labeling functions, but this keyword approach allowed them to cover many edge cases specific to adult content.  

The results: a content filtering model above target and on time

Before turning to Snorkel Flow, the customer projected that the project would take six months. They would have had to manually label tens of thousands of profiles to lift model performance to the level needed. And they didn’t have six months to spare.

Instead, the client achieved a recall of 96% using Snorkel Flow In just three weeks. The quick turnaround was particularly impressive considering the absence of any labeled data at the onset, and particularly valuable because it allowed them to complete their project ahead of the deadline dictated by their pending partnership.

Ongoing success and future plans

The client continues to use our platform independently. They recently revalidated their model to account for data drift (which is significant in the world of social media and adult content) and found that it remained more accurate than they expected.

The client updated or removed a small number of labeling functions and exported a new version of the model to keep its recall high. I want to note that this would not be so easy with manual labeling. Reinvestigating the data and updating problematic labels could have taken human labelers several days—perhaps weeks—of cumulative labor. Our client completed this task in a couple of hours.

The outputs of this model have become central to the client’s data lake, powering downstream analytics and recommendation models. This model isn’t just a standalone solution: it’s a key piece that enables many other operations within the company.

Looking ahead, the client plans to expand their use of Snorkel Flow to other projects. We’re excited to continue supporting them in their machine learning journey.

Snorkel Flow: accelerating AI data development

This case study underscores the transformative power of machine learning in improving content filtering. Through Snorkel Flow, the client was able to drastically improve their adult content labeling model, meet their partnership requirements, and set the stage for future success.

As machine learning continues to evolve, we’re excited to see how it will further revolutionize content filtering and other critical business operations.

Learn how to get more value from your PDF documents!

Transforming unstructured data such as text and documents into structured data is crucial for enterprise AI development. On December 17, we’ll hold a webinar that explains how to capture SME domain knowledge and use it to automate and scale PDF classification and information extraction tasks.

Sign up here!

Gabe smith: zero to 96% recall content filtering classifier in just 3 days!