The adoption of trustworthy AI and its successful integration into our country’s most critical systems is paramount to achieving the goal of employing AI applications to accelerate economic prosperity and national security. However, traditional approaches to developing AI applications suffer from a critical flaw that leads to significant ethics and governance concerns. Specifically, AI today relies on massive, hand-labeled training datasets…
ML models will always have some level of bias. Rather than relying on black-box algorithms, how can we make the entire AI development workflow more auditable? How do we build applications where bias can be easily detected and quickly managed? Today, most organizations focus their model governance efforts on investigating model performance and the bias within the predictions. Data science…
The future of data-centric AI talk series Background An AI system consists of two parts: the model— algorithm or some code—and data. The dominant paradigm in machine-learning researchers has been for most data scientists, including myself, to download a fixed dataset and iterate on the model. That this has become conventional is a tribute to how successful this model-centric approach…
Using a data-centric approach to capture the best of rule-based systems and ML models for enterprise AI One of the biggest challenges to making AI practical for the enterprise is keeping the AI application relevant (and therefore valuable) in the face of ever-changing input data and evolving business objectives. Practitioners typically use one of two approaches to build these AI applications:…
Proliferating web technology has contributed to information warfare in recent conflicts. Artificial Intelligence (AI) can play a significant role in stemming disinformation campaigns, cyber-attacks, and informing diplomacy in the rapidly evolving situation in Ukraine. Snorkel AI is dedicated to supporting the National Security community and other enterprise organizations with state-of-the-art AI technology. We see this as our responsibility in the…
Genentech, a global biotech leader and member of the Roche Group, leveraged Snorkel Flow to extract critical information from lengthy clinical trial protocol (CTP) pdf documents. They built AI applications that used NER, entity linking, text extraction, and classification models to determine inclusion/ exclusion criteria and to analyze Schedules of Assessments. Genentech’s team achieved 95-99% model accuracy by using Snorkel…
The future of data-centric AI talk series Background Michael DAndrea is the Principal Data Scientist at Genentech. He earned his MBA from Cornell University and a Master’s degree in Computing and Education from Columbia University. He currently works on using unstructured data sources for clinical trial analytics and his team is partnered with the Stanford “AI For Health” initiative as…
The Future of Data-Centric AI Talk Series Background Roshni Malani received her PhD in Software Engineering from the University of California, San Diego, and has previously worked on Siri at Apple and as a founding engineer for Google Photos. She gave a presentation at the Future of Data-Centric AI virtual conference in September 2021. Her presentation is below, lightly edited…
Enabling iterative development workflows with Snorkel Flow’s Application Studio. Consider this scenario— we’re AI engineers, and we’re building a social media monitoring application to track the sentiment of Fortune 500 company mentions in the news.
ScienceTalks with Saam Motamedi We at Snorkel AI have received many requests from data scientists and machine learning engineers who aspire to be founders, where do they start and how should they get started on their entrepreneurial journey? We genuinely believe that data scientists and machine learning engineers will build the next generation of mega-enterprises. Over the summer, we’ve recorded…
Frontend Development Best Practices for Working With Lots of Data From Snorkel AI Engineering As a frontend engineer, it’s often easy to run into limitations when scaling large applications. At Snorkel AI, we often run into times where our users work with data that scales into the gigabytes when using Snorkel Flow. We have built Snorkel Flow around two core…
Snorkel Flow LTS Release Summer ‘21 By adopting Snorkel Flow, a data-centric AI development platform powered by programmatic labeling, our customers have changed how they build and deploy AI applications. We’ve seen our customers save tens-of-millions of dollars in manual labeling costs and person-years of time by applying weak supervision with Snorkel Flow.Over the last few months, we’ve been hard…
The how, what, and why of Snorkel’s programmatic data labeling approach and the state-of-the-art Snorkel Flow platform. The year was 2015. For the first time, machine learning (ML) had outperformed humans in the annual ImageNet challenge.
In this episode of Science Talks, Explosion AI’s Ines Montani sat down with Snorkel AI’s Braden Hancock to discuss her path into machine learning, key design decisions behind the popular spaCy library for industrial-strength NLP, the importance of bringing together different stakeholders in the ML development process, and more.This episode is part of the #ScienceTalks video series hosted by the Snorkel AI team. You…
We’ll analyze major sources of errors during the four steps of building AI applications: data labeling, feature engineering, model training, and model evaluation.