What is data annotation? Data annotation refers to the process of categorizing and labeling data for training datasets. This process plays a critical role in preparing data for machine learning models, as high-quality training data enables more accurate predictions and insights. In order for a training dataset to be usable, it must be categorized appropriately and annotated for a specific…
Highlighting the best practices for building and deploying AI models for financial document processing applications AI has massive potential in the financial industry. Building AI models to automate information extraction, fraud detection, and compliance monitoring can provide efficient and faster responses and support repurposing domain experts’ labor to more meaningful tasks. Developing AI models is not just about having models…
The following post is based on a talk discussing the benefits of programmatic labeling for trustworthy AI, which was presented as part of the Trustworthy AI: A Practical Roadmap for Government event that took place this past April, with Snorkel AI Co-founder and Head of Technology, Braden Hancock. If you would like to watch Braden’s presentation, we have included it…
If you were ever amazed at how Google accurately finds the answer to your question just by a few keywords, you’ve witnessed the power of named entity recognition (NER). By quickly and accurately identifying different entities in a sea of unstructured articles, like names of people, places, and organizations, the search engine can figure out each article’s main topics and…
Gregory Ihrie is the Chief Technology Officer for the FBI, responsible for technology, innovation, and strategy. He also leads the FBI’s efforts in advancing the bureau’s management, policy, and governance of AI systems. Ihrie chairs the FBI’s Scientific Working Group on Artificial Intelligence, as well as the Department of Justice’s AI Committee of Interest. He is one of three officers…
Browse through these FAQ to find answers to commonly raised questions about Snorkel AI, Snorkel Flow, and data-centric AI development. Have more questions? Contact us. Programmatic labeling Use cases 1. What is a labeling function? A Labeling Function (LF) is an arbitrary function that takes in a data point and outputs a proposed label or abstains. The logic used to…
We’re currently experiencing such a rapid AI revolution and adoption of technologies, ranging from autonomous cars to virtual assistants and robotic surgeries and so much more, making it challenging for our government agencies to keep up. Especially when adding AI technologies to the mix, it can be even harder to manage.The crucial adoption of trustworthy AI and its successful integration…
The founding team of Snorkel AI has spent over half a decade—first at the Stanford AI Lab and now at Snorkel AI—researching weak supervision (WS) and other techniques for breaking through the biggest bottleneck in AI: the lack of labeled training data. This research has resulted in the Snorkel research project and 150+ peer-reviewed publications. Snorkel’s technology which applies weak…
Leveraging Snorkel Flow to extract critical data from annual quarterly reports (10-Ks) Introduction It can surprise those who have never logged into EDGAR how much information is available in annual reports from public companies. You can find tactical details like the names of senior leadership, top shareholders, and more strategic information like earnings, risk factors, and the company strategy and vision. Warren…
An introduction to AI in cybersecurity with real-world case studies in a Fortune 500 organization and a government agency Despite all the recent advances in artificial intelligence and machine learning (AI/ML) applied to a vast array of application areas and use cases, success in AI in cybersecurity remains elusive. The key component to building AI/ML applications is training data, which…
How can data-centric AI speeds your end-to-end healthcare AI development and deployment Healthcare is a field that is awash in data, and managing it all is complicated and expensive. As an industry, it benefits tremendously from the ongoing development of machine learning and data-centric AI. The potential benefits of AI integration in healthcare can be broken down into two categories:…
In our previous posts, we discussed how explainable AI is crucial to ensure the transparency and auditability of your AI deployments and how trustworthy AI adoption and its successful integration into our country’s critical infrastructure and systems are paramount. In this post, we dive into making trustworthy and responsible AI possible with Snorkel Flow, the data-centric AI platform for government and federal agencies. Collaborative labeling and…
In our previous post, we discussed how trustworthy AI adoption and its successful integration into our country’s critical infrastructure and systems are paramount. In this post, we discuss how explainability in AI is crucial to ensure the transparency and auditability of your AI deployments. Outputs from trustworthy AI applications must be explainable in understandable terms based on the design and implementation of…
The adoption of trustworthy AI and its successful integration into our country’s most critical systems is paramount to achieving the goal of employing AI applications to accelerate economic prosperity and national security. However, traditional approaches to developing AI applications suffer from a critical flaw that leads to significant ethics and governance concerns. Specifically, AI today relies on massive, hand-labeled training datasets…
ML models will always have some level of bias. Rather than relying on black-box algorithms, how can we make the entire AI development workflow more auditable? How do we build applications where bias can be easily detected and quickly managed? Today, most organizations focus their model governance efforts on investigating model performance and the bias within the predictions. Data science…