Foundation Models 101: a guide with essential FAQs
What are foundation models?
Foundation models are large AI models trained on enormous quantities of unlabeled data—usually through self-supervised learning. This process creates generalized models capable of a wide variety of tasks, such as image classification, natural language processing, and question-answering, with remarkable accuracy.
FMs excel at generative, human-in-the-loop tasks, such as writing marketing copy or creating detailed art from a simple prompt. These models serve as a fantastic starting point for enterprise applications.
That’s where the “foundation” in foundation models comes in. Data scientists can build upon generalized FMs and fine-tune custom versions with domain-specific or task-specific training data. This approach greatly enhances their domain- or task-specific performance, and could open new worlds of capabilities for organizations spanning many industries.
How do foundation models generate responses?
Foundation models underpin generative AI capabilities, from text-generation to music creation to image generation. To do this, FMs use learned patterns and relationships to predict the next item or items in a sequence. In the case of text-generating models, that’s the next word or phrase. For image-generation models, that’s the next, less blurry version of an image. In either case, the model starts with a seed vector derived from a prompt.
Due to the way FMs choose the next word, phrase, or image feature, foundation models can generate an enormous number of unique responses from a single prompt. The models generate a probability distribution over all items that could follow the input and then choose the next output randomly from that distribution. This randomization is amplified by the models’ use of context; each time the model generates a probability distribution, it considers the last generated item—which means each prediction impacts every prediction that follows.
For example, let’s say we start a foundation model with a single token: “Where.” Any number of words could follow that token—from “dancing” to “pizza” to “excavators”—but variations on the verb “is” will be more common. Let’s say “is”, “are”, “was”, and “were” each has a probability of 0.1, and they’re stacked at the beginning of the distribution. Our model will randomly pick a value between zero and one. If that value is less than 0.4, it will select a variation of “is.” Assuming it picks “is,” the model will now generate a new probability distribution for what words could follow “Where is”—which will likely lean heavily toward possessive pronouns like “your” and “my.”
The model continues this way until it generates a response that it predicts to be complete. In this case, that might be “Where is my sweater? I left it here yesterday.”
What is self-supervised learning?
Self-supervised learning is a kind of machine learning that creates labels directly from the input data. For example, some large language models generate embedding values for words by showing the model a sentence with a missing word and asking the model to predict the missing word. Some image models use a similar approach, masking a portion of the image and then asking the model to predict what exists within the mask. In either case, the masked portion of the original data becomes the “label” and the non-masked portion becomes the input.
This differs from previous generations of machine learning architectures, which fell into two categories: supervised and unsupervised. Unsupervised learning requires no labels and identifies underlying patterns in the data. Model architectures that qualify as “supervised learning”—from traditional regression models to random forests to most neural networks—require labeled data for training.
FAQs (frequently asked questions) about foundation models
Foundation models are a new and emerging field, and many people have questions about them. We’ve collected some frequently asked questions about foundation models here, and answered them.
What is meant by foundation models?
A foundation model is a model trained on broad data that can be adapted to a wide range of downstream tasks. More importantly, it can be customized and fine-tuned to a user’s specific needs. That’s why researchers call them “foundation” models. You use them as a foundation to build upon.
What is the use of foundation models?
Foundation models are useful for a wide variety of things; some can generate text from text prompts. Others can generate text from image prompts or images from text prompts. More importantly, data scientists can build upon them to create models optimized for specific domains or specific tasks.
What are the benefits of foundation models?
Foundation models have many benefits. Their large size and broad source data that they’re trained on allow them to acquire a variety of emergent behaviors that make them applicable to a variety of tasks. GPT-3, for example, can summarize text, write original stories and answer open-ended questions. Foundation models can also be adapted with fine-tuning to increase their accuracy on specific domains or specific tasks.
Is BERT a foundation model?
BERT is a foundation model. It was one of the first entries in the field and predated the coining of the term “foundation model” by several years. When it was first released, many benchmarks saw new high scores from models that used BERT as a starting point (rather than training a model from scratch).
What are popular foundation models?
Many mainstream articles about foundation models focus on the GPT family of text-to-text foundation models (GPT-3, GPT-3.5, GPT-4 and ChatGPT). Others focus on cross-model text-to-image foundation models such as DALL-E and Stable Diffusion. But the most popular foundation models in business applications are probably BERT and derivations of BERT (RoBERTa, DistilBERT, AlBERT, etc.).
What are the advantages of foundation models?
Each individual foundation model offers its own unique advantages. But broadly speaking, the advantage of foundation models is that they are adaptable to a wide range of applications. Traditional machine learning models typically perform one task well, and only one task. A foundation model like GPT-3, on the other hand, can summarize text, answer questions and generate content based on prompts. And with focused adaptation or specialization, it can improve further on specific tasks of interest to a user.
What are some examples of Foundation Models?
The field of foundation models is developing fast, but here are some of the most noteworthy entries as of this page’s most recent update.
BERT
BERT, an acronym that stands for “Bidirectional Encoder Representations from Transformers,” was one of the first foundation models and pre-dated the term by several years. The open-source model, the first to be trained using only a plain-text corpus, quickly became an essential tool for natural language processing researchers.
BERT models proved useful in several ways, including quantifying sentiment and predicting the words likely to follow in unfinished sentences.
ChatGPT
ChatGPT elevated foundation models into the public consciousness by letting anyone interact with a large language model through a user-friendly interface. The service also maintains a state that stretches back over many requests and responses, imbuing the session with conversational continuity. The technology demonstrated the potential of foundation models as well as the effort required to bring them to a production use-case; while an LLM serves as ChatGPT’s backbone, OpenAI built several layers of additional software to enable the interface.
GPT-3, 3.5, 4 and 4o
“GPT” stands for “Generative Pre-trained Transformer.” GPT-3 is best known as the original backbone of ChatGPT. This model debuted in June 2020, but remained a tool for researchers and ML practitioners until its creator, OpenAI, debuted a consumer-friendly chat interface in November 2022. The company has since debuted several additional generations and variations of GPT models.
GPT—with or without its “Chat” wrapper—proved useful for generating text on demand from human-readable prompts.
DALL-E
DALL-E, produced by OpenAI, is a multi-modal model trained on text/image pairs. The resulting model allows users to describe a scene, and DALL-E will generate several digital images based on the instructions.
DALL-E can also accept images as an input and create variations on them.
Stable Diffusion
Stable Diffusion, released in 2022, offers capabilities similar to DALL-E; It can create images from descriptions, in-paint missing portions of pictures and extend an image beyond its original borders.
It differs from DALL-E by using a U-net architecture. This approach uses successive neural network layers to convert the original visual information describing colors and light levels into increasing levels of abstraction until it reaches the middle of the “U.” The second half of the U-net expands this abstraction back into an image.
Applying Foundation Models
Foundation Models have the potential to make business contributions through a number of applications, including the following non-exhaustive list.
Sentiment analysis
Foundation Models such as BERT can analyze customer feedback, reviews, and social media posts to determine the sentiment towards products or services. This can help provide valuable insights for product development and marketing strategies. While these models struggle on edge-cases, such as sarcasm or irony, they achieve high accuracy rates in aggregate.
Chatbots and virtual assistants
ChatGPT demonstrated that foundation models can serve as the seed for competent chat bots and virtual assistants that may help businesses provide customer support and answer common questions. However, building a chat bot or virtual assistant on top of something as generalized as ChatGPT could lead to embarrassing moments where the bot answers questions out-of-line with business priorities. Developers can overcome these limitations through fine-tuning and retrieval-augmented generation (RAG).
Content generation (written)
Foundation models can help businesses generate content, such as product descriptions or marketing copy. However, the models may struggle with generating text that could be called “creative” or that captures the unique voice and tone of the business. Additionally, some generated content may be repetitive or nonsensical.
Content generation (visual)
Multi-modal foundation models could help businesses—particularly design-focussed businesses—generate rough drafts of visual ideas. While the images created by foundation models may fall short of a business’s high standards, they can serve as a rapid brainstorming tool that allows a human designer to identify the most promising design and create a final version.
Language translation
A business operating in a multilingual environment could use foundation models to translate product descriptions, marketing materials, and customer support content into different languages. However, the models may struggle with translating idiomatic expressions, cultural references, or other language-specific nuances that human translators would likely handle better.
Information extraction & summarization
Foundation models can help businesses summarize and extract relevant information from any kind of long-form document, such as a call transcript or a corpus of customer support requests. However, foundation models may struggle with accurately identifying the relevant information, particularly if the information is presented in a non-standard format. Additionally, the models may need to be fine-tuned for specific domains or types of data to achieve optimal performance.
Exciting business applications
The field of foundation models is developing fast, but here are some of the most noteworthy entries as of this page’s most recent update.
LLM-enhanced search
In 2023, Microsoft began experimenting with a closed beta version of Bing that incorporated a chat interface powered by ChatGPT. The interface allowed users to make complex requests and receive a human-readable response annotated with web links. Other companies (including OpenAI itself) have since enhanced chatbots with internet search capabilities, and Google has enhanced internet search with LLM capabilities
AI code assistants
In June of 2021—more than a year before the debut of ChatGPT—OpenAI and GitHub partnered to release the first version of GitHub Copilot. This GPT-based code assistant uses machine learning algorithms to generate code suggestions as developers write. Other competitors have since entered the field. While the tools have their critics, developers generally agree that AI code assistants make their work easier.
AI writing assistants
A new generation of FM-backed writing assistants promises to improve written communication—particularly in the business domain. Tools like Grammarly spot writing errors and weaknesses in real time and suggest possible fixes at the click of a button. In addition to correcting spelling errors and punctuation, these tools can suggest re-wording sentences and phrases to strike a more business-friendly tone.
Customer support copilots
A variation on LLM-enhanced search, customer support copilots connect an LLM to a corporate knowledge base through an architecture known as retrieval-augmented generation (RAG). This allows customer support agents to type a question into the copilot’s chat interface and get a list of relevant links as well a direct answer to their question.
Challenges to FM adoption in enterprises
Foundation models have generated a lot of excitement in the corporate world, and show a lot of promise, but they are not a magic bullet. They pose certain challenges, and may not always be the right tool for every job.
Cost
Foundation models are extremely complex and require significant computational resources to develop, train, and deploy. For narrowly-defined use-cases, that cost may not be justifiable, when a smaller model may achieve similar (or better) results for a much lower price.
Interpretability
Foundation models are often described as “black boxes.” The humans using them may never understand how the models arrive at their predictions or recommendations. This can make it challenging for businesses to explain or justify their decisions to customers or regulators.
Privacy and security
Foundation models often require access to sensitive data, such as customer information or proprietary business data. This can raise concerns about privacy and security, particularly if the model is deployed in the cloud or accessed by third-party providers.
Legal and ethical considerations
The deployment of foundation models may raise legal and ethical considerations related to bias, discrimination, and other potential harms. The models are trained on an enormous quantity of data from the wild, and not all of that data will align with your business’s values. Businesses must ensure that their models are developed and deployed in a responsible and ethical manner, which may require additional oversight, testing, and validation.
Accuracy
Foundation models come pre-trained on massive amounts of wide-ranging data and may not be well-suited to a specific business’s needs. Out-of-the-box, FMs often fall well short of a busines’s required accuracy rate for a production application. Fine-tuning the model with domain-specific training data may push the FM over the bar, but a business may struggle to justify the time and cost required to do so.
Content generation (written)
Multi-modal foundation models could help businesses—particularly design-focussed businesses—generate rough drafts of visual ideas. While the images created by the current generation of foundation models are unlikely to meet a business’s high standards, they can serve as a rapid brainstorming tool that allows a human designer to identify the most promising design and create a final version of it.
What’s necessary for enterprises to adopt FMs?
Organizations determined to adopt Foundation Models must clear several hurdles to properly and safely use them for production use-cases.
Secure deployment
For most enterprise use-cases, using a foundation model via API is not an option. While it has since changed tack, OpenAI was very open about incorporating user data into their model training. Even if a vendor doesn’t use your data in their model, sending sensitive information to an API adds one more opportunity for malicious actors to access your data.
Fine-tuning
Out-of-the-box foundation models trained on general knowledge will struggle on domain-specific tasks. To improve the model’s performance to the point where business leaders feel comfortable using it, data scientists will have to gather and prepare data for fine tuning.
Cost-effective use
Foundation models are computationally complex and expensive to run. A report for Ars Technica noted that a ChatGPT-style search interface would cost roughly 10 times as much as Google’s standard keyword search. But organizations need not use the largest of foundation models for their end use cases. They may instead use them to help train smaller, more focused models that can achieve the same (or better) performance for a fraction of the price.
Key research
Researchers have published hundreds of papers relevant to the advancement of foundation models and large language models, but the following papers roughly sketch the trajectory of the field.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Radford et al. (2016)
This paper introduced DCGANs, a type of generative model that uses convolutional neural networks to generate images with high fidelity.
Attention Is All You Need
Vaswani et al. (2017)
This paper introduced the Transformer architecture, which revolutionized natural language processing by enabling parallel training and inference on long sequences of text.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin et al. (2018)
This paper introduced BERT, a language model that uses bidirectional context to better understand the meaning of words in a sentence. BERT has become a widely used pretraining model in natural language processing.
Language Models are Few-Shot Learners
Brown et al. (2020)
This paper introduced GPT-3, a language model that can perform a wide range of natural language tasks with little or no task-specific training. GPT-3 is notable for its large size (175 billion parameters) and its ability to generate coherent and convincing text.
DALL-E: Creating Images from Text
Ramesh et al. (2021)
This paper introduced DALL-E, a generative model that can create images from textual descriptions. DALL-E has demonstrated impressive capabilities in generating realistic and imaginative images from natural language input.
On the Opportunities and Risks of Foundation Models
Rishi Bommasani, Percy Liang, et al. (2021)
This paper highlights progress made in the field of foundation models, while also acknowledging their risks—particularly the potential ethical and societal concerns, the impact on job displacement, and the potential for misuse by bad actors.
Training language models to follow instructions with human feedback
Ziegler et al. (2022)
This paper explored the importance of human feedback in aligning LLMs with human values and preferences.
Snorkel’s work on foundation models
Snorkel and researchers associated with Snorkel actively pursue ways to make foundation more usable and more widely understood. Snorkel’s work on these topics is ongoing, but here’s a sample.
Data-centric Foundation Model Development: Bridging the gap between foundation models and enterprise AI
In November 2022, Snorkel announced an update to the Snorkel Flow platform that incorporated three features built upon foundation models: foundation model fine-tuning, Warm Start and Prompt Builder. Prompt Builder allows users to construct labeling functions using plain-text prompts delivered to foundation models. Warm Start uses foundation models to automatically create an initial collection of labeling functions at the push of the button, and foundation model fine-tuning lets users easily build customized versions of GPT-3, CLIP and other foundation models.
Seven research papers push foundation model boundaries
In 2022, Snorkel researchers and our academic partners published seven papers on foundation models, including a 161-page deep dive into the promise and peril presented by these incredibly potent, but poorly understood, tools. Other papers devised strategies to coax greater performance from foundation models – sometimes while simultaneously decreasing their size and cost.
How Pixability uses foundation models to accelerate NLP application development by months
In this case study, we show how Snorkel customer Pixability used foundation models to greatly speed up the development of its content categorization model. Using Snorkel Flow’s Data-centric Foundation Model Development workflow, Pixability was able to build an NLP application in less time than it took a third-party data labeling service to label a single dataset. This data-centric workflow allowed Pixability to scale up the number of classes they could classify to over 600 while also increasing model accuracy to over 90% with the new workflow.
Foundation Model Landscape
The foundation model landscape is vast and varied. Academic institutions, open-source projects, exciting startups and legacy tech companies all contribute to the advancement of the field. This technology has moved fast and continues to do so. Compiling a complete snapshot of current FM resources is an enormous task beyond the scope of this document, but the following non-exhaustive list sketches some important contours in the landscape.
The Stanford Center for Research on Foundation Models
Founded in 2021, the Stanford Center for Research on Foundation Models (CFRM) focuses on advancing the development and understanding of robust, secure, and ethical foundation models. CFRM aims to address the technical, social, and ethical challenges foundation models present and to develop solutions that can benefit society through research and partnering with government agencies and corporations.
OpenAI
Founded in 2015, OpenAI conducts cutting-edge research in machine learning, natural language processing, computer vision, and robotics, and shares its findings with the scientific community through publications and open-source software. OpenAI is responsible for debuting GPT-3, DALL-E and ChatGPT. The company has partnered with Microsoft since 2019. In early 2023, Microsoft began integrating ChatGPT into the Bing search engine.
Cohere
Stylized as “co:here” and co-founded by an author of one of the papers that launched the field of foundation models, Cohere offers a suite of large language models via API. By using Cohere’s software libraries and endpoints, developers can build applications that understand written content or generate written output without having to train or maintain their own LLMs.
ArXiv.org
ArXiv.org hosts and distributes scientific research papers form many disciplines, including mathematics, physics, and computer science. Members of the scientific community—including many members of the foundation model research community—use ArXiv.org as a way to share preprints of research papers before they are published in academic journals. Cornell University operates and maintains the site with funding from several organizations, including the Simons Foundation and the National Science Foundation.
Hugging Face
Hugging Face develops and maintains open-source resources that allow programmers to easily access and build upon foundation models, including BERT, GPT, and RoBERTa. The company is best known for NLP tools, but also enables the use of computer vision, audio, and multimodal models. Hugging Face’s contributions to the NLP community have helped accelerate progress in the field and make it more accessible to developers and businesses.
Google has developed several large-scale models with important impacts on the field of foundation models, including T5 and BERT—the latter of which has become a standard tool for many NLP researchers. Google has since launched a suite of foundation model APIs as well as Gemini, its alternative to ChatGPT
Microsoft
Microsoft launched its Language Understanding Intelligent Service in 2016. The cloud-based NLP platform enables developers to create and deploy custom NLP models for use in applications. The next year, the company launched Tay, a doomed early public experiment in conversational understanding. More recently, Microsoft has partnered closely with OpenAI and experimented with integrating ChatGPT into the Bing search engine.
Request a free, custom LLM evaluation from Snorkel AI
To ship LLMs with confidence, enterprises need custom evaluations that are purpose-built for their domains and use cases. In a unique offer, Snorkel AI will help select organizations with a complimentary LLM evaluation, powered by Snorkel’s programmatic data development technology.