‘Data poisoning’ anti-AI theft tools emerge — but are they ethical?

Published: October 30, 2023

Summary

Generative AI models raise concerns about intellectual property (IP) theft due to their ability to scrape content from the internet. To combat this, technologists are developing tools like digital watermarks and data poisoning techniques to protect copyrighted material.

Data poisoning manipulates training data to introduce unexpected behaviors into AI models, creating inaccurate responses and reducing users’ trust in them. In theory, this protects the rights of content creators by encouraging foundation model builders not to scrape any material that might be poisoned.

However, the ethics of using such tools are debatable. While they can help protect IP, they can also be misused to harm AI systems. As the legal landscape evolves to address these issues, companies are exploring ways to ethically use AI while ensuring the protection of intellectual property rights.

Recommended
articles

See all articles

Data development

Building the Benchmark: Inside Our Agentic Insurance Underwriting Dataset

In this post, we unpack how Snorkel built a realistic benchmark dataset to evaluate AI agents in commercial insurance underwriting. From expert-driven data design to multi-tool reasoning tasks, see how our approach surfaces actionable failure modes that generic benchmarks miss—revealing what it really takes to deploy AI in enterprise workflows.

Chris Glaze, Fred Sala

July 10, 2025

Data development

Evaluating AI Agents for Insurance Underwriting

In this post, we will show you a specialized benchmark dataset we developed with our expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark uncovers several model-specific and actionable error modes, including basic tool use errors and a surprising number of insidious hallucinations from one provider. This is part of an ongoing series of benchmarks we are releasing across verticals…

Chris Glaze

June 26, 2025

Data development

LLM Observability: Key Practices, Tools, and Challenges

LLM observability is crucial for monitoring, debugging, and improving large language models. Learn key practices, tools, and strategies of LLM observability.

Snorkel Team

June 23, 2025

Join our newsletter for expert advice, the latest research, and exclusive events.

By submitting this form, I acknowledge I will receive email updates from Snorkel AI, and I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.

‘Data poisoning’ anti-AI theft tools emerge — but are they ethical?

Summary

Recommended
articles

Building the Benchmark: Inside Our Agentic Insurance Underwriting Dataset

Evaluating AI Agents for Insurance Underwriting

LLM Observability: Key Practices, Tools, and Challenges

Join our newsletter for expert advice, the latest research, and exclusive events.

Product

Solutions

Services

Industries

Customers

Resources

Learn

Engage

AI Primers

Docs

AI Research

Company

Contact

Compliance

‘Data poisoning’ anti-AI theft tools emerge — but are they ethical?

Summary

Recommended articles

Building the Benchmark: Inside Our Agentic Insurance Underwriting Dataset

Evaluating AI Agents for Insurance Underwriting

LLM Observability: Key Practices, Tools, and Challenges

Join our newsletter for expert advice, the latest research, and exclusive events.

Product

Solutions

Services

Industries

Customers

Resources

Learn

Engage

AI Primers

Docs

AI Research

Company

Contact

Compliance

Recommended
articles