Leveraging Snorkel Flow to extract critical data from annual quarterly reports (10-Ks)


It can surprise those who have never logged into EDGAR how much information is available in annual reports from public companies. You can find tactical details like the names of senior leadership, top shareholders, and more strategic information like earnings, risk factors, and the company strategy and vision. Warren Buffett is notorious for relying on annual company filings as the bedrock for his investing decisions. The reason why people more often praise Warren Buffett for this tendency, though, is because it’s rare: despite the information they contain, people dislike reading company reports like this because they are long, tedious, and rarely make it easy to find the information that you’re looking for. Snorkel Flow’s data-centric AI approach to application development offers a much easier and scalable approach to building high-quality AI applications to extract and classify information from these types of documents. 

My journey performing data extraction to 10-Ks

I got exposed to these annual and quarterly reports (formally called 10-Ks and 10-Qs, respectively) in my first job as an economic consultant. I quickly learned that although there is a wealth of information to gather from these documents, the often hundreds of pages per report per year, meant that I was in for a slog whenever we needed to extract company financial details or risk factors. Even worse, the economic consulting firm I worked at ended up having two people review each document in parallel to ensure we didn’t accidentally miss important details or mess up copying the crucial financial terms out. We were billing by the hour and would have used a better solution if one existed, but although these types of workstreams come up frequently in economic consulting, investment banking, auditing, and other financial services areas, most people hold their noses and have subject matter experts review these documents.

Given that background, It shouldn’t come as a surprise that one of the areas where I’ve been most excited to work with financial services companies is making it easier to process these types of forms. This can take a lot of different shapes, and we’ll dive deeper into each of the areas I’ve laid out below in follow-up blog posts. At a high level, though, these are some of the most common and impactful ways that I’ve seen financial services firms leveraging Snorkel Flow to avoid having expensive subject matter experts constantly reading through 10-Ks.

Know your customer (KYC)

Banks often have hundreds of KYC analysts manually extracting information from 10-Ks and other document types, spending 30+ minutes reviewing each document. Regulatory agencies require financial institutions to investigate and understand the customers they’re supporting (thus the name ‘Know Your Customer’) so that they don’t inadvertently facilitate money laundering, human trafficking, or other crimes. This means that KYC teams at banks need to review a variety of information about their potential and existing customers, including annual company filings.

Multiple banks have used Snorkel Flow to make it easier to extract critical attributes from 10-Ks and other documents. For example, automatically pulling out the company name, board of directors, and assets lets these analysts and subject matter experts focus on the harder problems associated with KYC and anti-money laundering. Even more importantly, programmatic labeling lets these KYC teams collaborate with data science teams to quickly update label schemas and extraction applications for changing regulations or business requirements.

Obtaining crucial information from 10ks using data extraction with Snorkel Flow, this image shows a dataset of 10k reports and the output of the machine learning model that automatically identifies attributes for a company

Data extraction and classification of risk factors

One of the problems that I personally encountered in economic consulting was looking at risk factors. Understanding the headwinds that companies choose to call out – and how those change over time – can provide insights into the operations and future of a single company as well as a broader industry segment. In this case, Snorkel Flow enables subject matter experts and data scientists to quickly adapt to changing risk types (for example, adding in ESG risk factors) if needed while collaborating on end-to-end applications to identify and extract critical sentences or paragraphs from annual reports.

Graphical user interface, application

These applications then make it easy to identify changes in risk factors over time…

Generate alerts for specific types of risk categories that pop up…

Diagram Description automatically generated with medium confidence

Or make it much faster for subject matter experts to skim and jump to the critical parts of annual company reports.

Graphical user interface, text, application, email Description automatically generated

Doing these types of analyses by hand requires huge amounts of manual effort, by very experienced people. The automation provided by Snorkel Flow makes these types of workflows that are normally blocked by hand annotation and manual document review possible.

Data extraction on other types of financial details

Finally, the third category of application includes more focused efforts that allow teams to extract details around company spending that may be tougher to find even if humans review the documents. For example, I often needed to pull out exposure to credit default swaps (back in 2009 when the financial crisis was getting into full swing). Sometimes those show up in tables, and other times, they show up as free-form text descriptions somewhere in the document. In other cases, pulling out exposure or spending on financial derivatives like interest rate swaps can help investment banking teams understand market dynamics and identify potential customers.

Interest rate swap application with Snorkel Flow. Obtaining crucial information from 10ks using data extraction with Snorkel Flow

Final thoughts

I wanted to provide some insight into one of the most common yet powerful ways companies leverage Snorkel Flow. Almost everyone that has to read 10-Ks for their day jobs (outside of perhaps Warren Buffett) would rather be spending their time doing almost anything else. Normally, this is nothing more than an idle fancy as you flip through another exhibit or click through to another page in a PDF. The chance to help companies translate these dreams into reality, though, is one of the most rewarding parts of introducing Snorkel Flow to these types of financial institutions for me.

To learn more about essential data extraction with Snorkel Flow, request a demo or follow us on TwitterLinkedinFacebookYouTube, or Instagram. If you’re interested in joining the Snorkel team, we’re hiring! Please apply on our careers page.