CUSTOMER STORY

Wayfair accelerates product tagging with Snorkel Flow: 10x faster labeling

Industry:
Online Retail
Solution:
Product Tagging
10x

Faster development vs. manual labeling

+20

Points accuracy compared to supplier baseline

+7%

To a leading indicator for add-to-cart rate


Wayfair, the leading home goods and furniture e-commerce retailer, offers over 40M products to 22M customers. Wayfair relies on machine learning (ML) and product tagging to ensure customer searches result in relevant products. Tagging information like color, style, and pattern can make all the difference in a customer finding the right product at the right time. However, supplier-provided tags can be incomplete and inconsistent, and manually labeling products is time-intensive and difficult to scale.

Wayfair partnered with Snorkel to work with our experts and proprietary technology to create a data-centric AI development workflow that improved automated catalog tagging across their products. The teams also partnered on a computer vision use case that unlocked additional rich information. As a result, a team of Wayfair and Snorkel experts programmatically labeled product catalog data 10x faster than manual labeling. The resulting model performed 20 points better on average than models trained only on supplier-provided labels.


Using AI to supercharge online retail

Wayfair is a Boston-based e-commerce company specializing in home goods and furniture, serving ~22M customers and partnering with ~20K suppliers. To ensure that relevant products appear in customer searches (e.g., “blue outdoor pillows”), they rely on product tags (e.g., “blue” “outdoor” and “pillow”). With over 10,000 product tags across 40 million products, creating and managing labeled data is an enormous and time-consuming effort.

Wayfair built a data-centric AI development workflow to help improve automated catalog tagging across their products With Snorkel Flow.

Challenge

Wayfair uses machine learning (ML) across the business to improve search and customer experience and organize its rich product catalog, but they wanted to innovate further to keep pace with growing inventory and changing consumer behavior. They were also looking to launch a computer vision initiative to unlock information from millions of product images using visual information extraction (VIE).

However, creating sufficient training and validation sets became a major bottleneck. Supplier labels were often insufficient, inaccurate, or inconsistent. With limited standardization and the subjective nature of the manually labeled data, datasets were noisy and unreliable. Relying on suppliers for this information could also cause unnecessary friction in key relationships.

Manually labeling training data was prohibitively slow. Wayfair outsourced manual labeling using human-in-the-loop (HITL) data curation, but it can take 8 weeks to label just 6,000 images. Manually labeling the whole catalog would require person-years of effort, and HITL can struggle to keep pace with new inventory and evolving tags and searches.

Rich information was buried within images and was challenging to extract and utilize. Wayfair has millions of product images that reveal the style, pattern, and theme to the viewer; however, they struggled to transform that information into tags that their ML could interpret.

Wayfair had experimented with an off-the-shelf foundation model, CLIP, for zero-shot image retrieval tasks. It worked well for about 5% of cases but made inaccurate high-confidence predictions and could not provide all necessary tags for their extensive product catalog. It was especially difficult to build datasets for high-value edge cases where sample data was limited.

Goal

Improve data labeling and reduce ML development hours with automated labeling to extract rich visual information accurately and with less manual intervention.

Solution

Programmatic labeling and computer vision for product tags

Wayfair partnered with Snorkel’s experts, who used our proprietary technology to speed the development of models designed to tackle key challenges in curating and managing their catalog data. These models help Wayfair reduce manual labeling costs and automate improvements across search, marketing, and customer experience.

Starting with millions of noisily labeled images, the team built 46 tag models within days instead of months. Armed with faster and more accurate modeling, Wayfair realized significant cost savings and major gains in search relevancy and key revenue predictors.

Auto-extracting visual information from product images

Wayfair was developing ML models to pull information from their product images and convert that data into tags used to return better, more accurate results to customer searches. To supply these algorithms with the best possible data to learn from, Wayfair collaborated with Snorkel’s experts, who used our proprietary technology to improve the quality of the training data and uncover image quality issues.

Together, the teams developed a workflow that incorporated data preprocessing, curation, and iterative development to extract and apply visual data to product labels. Armed with our proprietary technology, Snorkel’s experts cleaned data, removed outliers and duplicates, and quickly prepared training and evaluation datasets with strategic sampling and prompting.

A visualization of the image tagging pipeline Wayfair built in Snorkel Flow.
Snorkel’s computer vision workflow for data preprocessing and iterative model development. Source: Wayfair Blog

By leveraging foundation models to automatically label data and iteratively looping that data to explore and improve ML, Snorkel’s experts helped Wayfair unlock rich visual information from millions of product images—saving years of manual labeling effort and improving accuracy over supplier-provided labels. With this iterative, data-centric workflow, Snorkel’s experts could build models for Wayfair ten times faster than their previous manual HITL processes, achieving the same or greater accuracy.

Model-guided error analysis

Our proprietary technology enabled Snorkel’s experts and Wayfair’s data scientists to visually analyze the failure modes of the model on the validation data (e.g., “Geometric” pattern mistakenly tagged as “Chevron”). This allowed the joint team to obtain richer training data iteratively (e.g., adding geometric accent chairs as negative labels) to create better models.

Snorkel’s experts also generated higher-quality datasets, achieving a 20+ point accuracy boost over vendor-supplied labels. Improved datasets led to improved model performance, enabling customers to find the right products at the right time. Since engaging Snorkel, Wayfair has also seen a 7% increase in view-read—times when a customer clicks through to a product presented to them from a search. This is a leading indicator for add-to-cart rate and increased revenue.

Wyafair and Snorkel’s collaborative team overcame its challenges with:

  • Programmatic labeling: By feeding pre-trained foundation models relevant prompts (e.g., “Chevron area rugs”), Snorkel’s experts generated thousands of labels through labeling functions (LFs). They filtered, denoised, and refined those labels via weak supervision, and quickly built high-quality training sets for Wayfair’s ML.
  • Ensured adaptability: Tags and customer searches constantly evolve, and it’s time-consuming and expensive to manually update models. Snorkel’s experts built iterative, data-centric workflows to add or modify tags quickly and at scale.
  • Validation-in-the-loop: Snorkel’s proprietary technology allowed data scientists and subject matter experts—such as category managers and suppliers—to correct model errors and adjust validation datasets quickly while minimizing the manual efforts of human agents.

Ready to accelerate AI development?

Deploy production AI and ML applications 10-100x faster with Snorkel’s experts, using our proprietary technology.

Request a demo

Snorkel Logo

Ready to get started?

Take the next step and see how you can accelerate AI development by 100x.