Jason Fries

Self-supervised machine learning using adult inpatient data produces effective models for pediatric clinical prediction tasks

Research Paper

Self-supervised machine learning using adult inpatient data produces effective models for pediatric clinical prediction tasks

Nov 17, 2023 •

J. Lemmon, et al.

Learn more about Self-supervised machine learning using adult inpatient data produces effective models for pediatric clinical prediction tasks

INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis

Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary embolism (PE), along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including CT images, radiology report impression sections, and structured electronic health record (EHR) data (i.e. demographics, diagnoses, procedures, vitals, and medications). Using INSPECT, we...

Research Paper

INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis

Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary…

Nov 17, 2023 •

SC. Huang, et al.

Learn more about INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis

Scalable Approach to Medical Wearable Post-Market Surveillance

Objective: We sought to develop a weak supervision-based approach to demonstrate feasibility of post-market surveillance of wearable devices that render AF pre-diagnosis. Materials and Methods: Two approaches were evaluated to reduce clinical note labeling overhead for creating a training set for a classifier: one using programmatic codes, and the other using prompts to large language models (LLMs). Probabilistically labeled notes were then used to fine-tune a classifier, which identified patients with AF pre-diagnosis mentions in a note. A retrospective cohort study was conducted, where the baseline characteristics and subsequent care patterns of patients identified by the classifier were compared against...

Research Paper

Scalable Approach to Medical Wearable Post-Market Surveillance

Objective: We sought to develop a weak supervision-based approach to demonstrate feasibility of post-market surveillance of wearable devices that render AF pre-diagnosis. Materials and Methods: Two approaches were evaluated to reduce clinical note labeling overhead for creating a training set for a classifier: one using programmatic codes, and the other using prompts to large language models (LLMs). Probabilistically labeled notes…

Nov 15, 2023 •

RM. Yoo, et al.

Learn more about Scalable Approach to Medical Wearable Post-Market Surveillance

Weak Supervision Enables Scalable Post-Market Surveillance on Medical Wearables

Introduction: With the advent of consumer-facing devices that can render atrial fibrillation (AF) pre-diagnosis, medical wearables now have the potential to affect diagnosis rates and medical care. Post-market surveillance is necessary to understand the impact of wearables on patient outcomes and health care utilization, but is hindered by the lack of codified terms in EHR that capture wearable use. Research Questions: Constructing a post-market surveillance system therefore requires a classifier that identifies mentions of AF pre-diagnosis in unstructured EHR data. However, fine-tuning classifiers require large, hand-labeled training sets that can be costly to generate. It is unclear whether a scalable...

Research Paper

Weak Supervision Enables Scalable Post-Market Surveillance on Medical Wearables

Introduction: With the advent of consumer-facing devices that can render atrial fibrillation (AF) pre-diagnosis, medical wearables now have the potential to affect diagnosis rates and medical care. Post-market surveillance is necessary to understand the impact of wearables on patient outcomes and health care utilization, but is hindered by the lack of codified terms in EHR that capture wearable use. Research…

Nov 06, 2023 •

RM. Yoo, et al.

Learn more about Weak Supervision Enables Scalable Post-Market Surveillance on Medical Wearables

Blog

Two approaches to distill LLMs for better enterprise value

Distillation techniques allow enterprises to access the full predictive power of large language models at a tiny fraction of their cost.

Oct 31, 2023 •

Jason Fries

Learn more about Two approaches to distill LLMs for better enterprise value

The Stanford Medicine data science ecosystem for clinical and translational research

Research patient data repositories are essential for health systems to learn from the experiences of their patients and for advancing the mission of academic medical centers. In this paper, we describe methods, tools, and practices at Stanford Medicine to maintain its research patient data repository and computing resources to support clinical and translational research, which together comprise the Stanford Medicine Data Science Resources (SDSR). The SDSR includes computing infrastructure and tools to create, search, retrieve, and analyze patient data. Data are made available via self-service and staff supported access, on secure computers. The Stanford Medicine Research Data Repository functions as...

Research Paper

The Stanford Medicine data science ecosystem for clinical and translational research

Research patient data repositories are essential for health systems to learn from the experiences of their patients and for advancing the mission of academic medical centers. In this paper, we describe methods, tools, and practices at Stanford Medicine to maintain its research patient data repository and computing resources to support clinical and translational research, which together comprise the Stanford Medicine…

Oct 20, 2023 •

A. Callahan, et al.

Learn more about The Stanford Medicine data science ecosystem for clinical and translational research

MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records

The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides...

Research Paper

MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records

The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and…

Oct 20, 2023 •

SL. Fleming, et al.

Learn more about MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records

The shaky foundations of large language models and foundation models for electronic health records

The success of foundation models such as ChatGPT and AlphaFold has spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models’ capabilities. In this narrative review, we examine 84 foundation models trained on nonimaging EMR data (i.e., clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g., MIMIC-III) or broad, public biomedical corpora (e.g., PubMed) and are...

Research Paper

The shaky foundations of large language models and foundation models for electronic health records

The success of foundation models such as ChatGPT and AlphaFold has spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models’ capabilities. In this narrative review, we examine 84 foundation models trained on nonimaging EMR data (i.e., clinical…

Oct 20, 2023 •

M. Wornow, et al.

Learn more about The shaky foundations of large language models and foundation models for electronic health records

EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three contributions. First, we publish a new dataset, EHRSHOT, containing de-identified structured data from the electronic health records (EHRs) of 6,712 patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients....

Research Paper

EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three…

Oct 20, 2023 •

M. Wornow, et al.

Learn more about EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

Jason Fries

The latest from Jason

For models that need to be right. Not just good enough.

How do you want to work with Snorkel?