Frontend Development Best Practices for Working With Lots of Data From Snorkel AI Engineering

As a frontend engineer, it’s often easy to run into limitations when scaling large applications. At Snorkel AI, we often run into times where our users work with data that scales into the gigabytes when using Snorkel Flow. We have built Snorkel Flow around two core ideas: first, that AI development should be data-centric, i.e., it should involve rapidly iterating on your model’s training data as the core activity; and second, that labeling this training data should be done programmatically, enabling users to label massive volumes at scales that wouldn’t be possible with a manual approach. Concretely, this means that it’s key for Snorkel Flow’s frontend to render large-scale and highly dynamic content rapidly.At Snorkel AI, we heavily rely on the Next-React-Node stack, which provides us with ample benefits for building performant code. React’s reusability allows us to build up applications quickly and manage the state with ease. One of the downsides of React when building large applications is balancing the amount of re-renders, and complex React components that get displayed to the user. Snorkel Flow is multi-modal supporting text, documents, time series, rich text, and other data types, as an example for this blog our text analysis components are built with many different sub-components that need to be rendered and then re-rendered when a user interacts with it. When scaled up to handle large documents that are tens of thousands of lines long, this often means we will be rendering so many components that it possibly causes slowdowns in the user interface (UI). One of the methods we used to reduce performance issues on these huge documents is the concept of windowing (virtualization), where we render content only that’s visible to the user within the browser window and hide content that’s beyond the scroll bar, only rendering blank boxes that would approximate the total size of the entirety of data.In this blog post, we dive into what windowing is, how it works, why it’s crucial at web-scale, and different production strategies to support a diverse array of datasets. 

Windowing 101

Windowing [1], also referred to as virtualization or virtual scrolling is a strategy used in web development to optimize the performance of rendered elements on the document object model (DOM) by selectively rendering only elements in view. 

As frontend engineers, we work with different types of tasks. In this image we show two representations of a web page, one with windowing (virtualization) implemented and one without it - on a list of items

For instance, above, we show two representations of a web page — one with windowing implemented and one without — on a list of items. In the implementation without windowing, each spreadsheet row is rendered even when it’s out of the user’s view. This action means our browser will have to hold that DOM information in memory, even while a user has no way of viewing it. . If we build this component in something like React, anytime we want to re-render our spreadsheet, such as when a user interacts with it, we would also need to re-render all these rows, even though the user would only really want to see content that’s actually visible.In the second representation shown in the figure above, we implemented a windowing strategy, where we only render items that are actually in the user’s viewport. When a user scrolls through this view, we are able to see more rows, while rows out of view are not rendered. The windowing strategy is popular across web development, social media feeds, online spreadsheets, and mapping applications. Outside of web development, we can see the concept of windowing used in all forms of computer graphics. Something similar to windowing is seen in video game development, where only the parts visible to the characters are actually rendered.

Your first window

To get started, let’s build a simple windowing solution to understand how it works on the technical side. We will create a function called windowList that takes in an element that we want to act as the window, a list of text we want to window, and the height of each item in a list.

In the snippet shown above, we see two functions used to render our text (windowList and renderText) and two functions that set up our space (setUpElement and setUpList). setUpElement creates a window that’s 200px wide and 200px tall and attaches it to our DOM. setUpList generates a list of 1000 elements. Both of these functions are passed into windowList.windowList takes in the element that we want to act as a window, the list of items we want to render, and the height of an item in our list. Using that information, we want to create an element that matches the total size of this list if it were all to be rendered. This element that we labeled scrollablePane will trick the windowed element into allowing us to scroll up and down the parent window. We can use the current scroll position generated from listening to the parent to render our content.In renderText we take the current scroll position to find a bounding box of what needs to be rendered. Next, we render only the elements we need to show between the startIndex and stopIndex.We can expand this solution to work in two directions as long as we know the user’s scroll position within the window.

Windowing with React

React’s rendering strategy works excellently with the windowing pattern as the DOM is already virtualized and then selectively rendered. Open source, in this case, has powered how we at Snorkel AI virtualize most of the time, using the React Virtualized library to jump-start virtualization efforts.

Windowing Caveats

Windowing, while powerful, comes with some caveats you should look out for:

  • Windowing will break native browser text search – Since text will now only render in the viewport, it no longer is possible to search using the browser’s native search tool. You can address this by implementing your own text search. For applications where a search function is critical, you can enhance search with features like case sensitivity and regular expressions (going beyond what is offered by native browser search).
  • Keep accessibility in mind  – Add a small buffer zone of rendered content outside the viewport to let screen readers know more is content available. This action can force a scroll when moving to that text and allowing the rendering process to continue. Apply the aria tag “feed”, “list”, or “table” to the parent and let screen readers know about this content for extra clarity.

At Snorkel AI, we built our virtualization system with caveats in mind. Building on top of react-virtualized, we created components that assist users in navigating and using these features as if it were a native user experience. Ultimately, it is up to you and the specifics of your application for how you want to build around a windowing component. Still, the suggestions above should be prioritized.

Frontend at Snorkel AI

Snorkel AI’s software engineers develop Snorkel Flow, the first truly data-centric AI platform that helps shape how enterprises deploy AI applications end-to-end. Snorkel Flow can handle vast amounts of data at a massive scale and extend well beyond data labeling. Wrangling data is part of what we do at Snorkel AI, and with data-centric development workflows for data as diverse as images to HTML pages to text messages, we’re always working on creating interactive, intuitive, and powerful user experiences for our customers. We are always looking for engineers who bring fresh ideas across the stack. As an engineer, you will be working on critical projects that help make out Snorkel Flow’s end-to-end AI development platform. Help us build the AI tools of tomorrow — we are hiring!If you’re interested in staying in touch with Snorkel AI, follow us on TwitterLinkedInFacebookYoutube, or Instagram.