There’s a lot of excitement about the potential for AI to improve healthcare. This is driven by compelling advances across a wide range of applications including drug discovery, radiology, pathology, electronic medical record (EMR) intelligence, clinical trials, and more. While this is a great opportunity, there are many practical and ethical challenges for development and deployment, both for AI in general and uniquely for healthcare. They need to be overcome for AI to deliver practical value.

Practical challenges

  1. Problem Selection: Choosing a problem that can improve care in practice requires close collaboration with subject matter experts and integration with existing systems. In settings like healthcare, it can often be more productive to complement human experts than to try to replace them. 
  2. Manual Labeling: Manual labels are often expensive and slow to acquire since annotation can require significant expertise. For example, manual labels for a new use case in AI for radiology can take months of labeling by board-certified radiologists (Dunnmon et al. 2019).
  3. Monitoring: AI systems require continuous monitoring, both over time and in different environments. Model performance can drop when the distribution a model is trained on is different from that in which it’s deployed. This can happen when they’re used in different hospitals, on different sets of patients, or on data collected from different machines. Moreover, models can often become less accurate over time due to concept drift.
  4. Subgroup Performance: Model performance needs to be carefully validated on different subsets of the data. Differences in performance that are not obvious in the aggregate can be clinically important. For example, models trained on one commonly used chest x-ray dataset perform much better at classifying pneumothorax when there is a chest drain than when there isn’t (Dunnmon et al. 2019). However, those that have a chest drain have already been treated and maybe less clinically important.
  5. Clinical Trials: Model performance should be validated in real environments since tools that perform well in controlled experiments may not translate well to actual practice. For example, computer-aided diagnosis (CAD) models for mammography were approved by the FDA and cost hundreds of millions of dollars per year based on controlled experiments. Later large-scale clinical trials demonstrated they did not improve clinical outcomes (Lehman et al. 2015).

Ethical challenges

Bias: Models can encode and reinforce societal biases against certain groups of people. For example, language models trained on clinical texts can predict different likely courses of action for people of different races (Zhang et al. 2020).

Fairness: Models can perform worse for some subgroups than others, raising issues of fairness (Dunnmon et al. 2019). For example, models can be better at identifying diseases in men than women depending on the underlying data. Would it be okay to trade-off performance in one group for the other? What if, despite the discrepancy, performance for all groups is better than the standard of care?Privacy: Especially in healthcare, it’s critical to protect the privacy of users. Sensitive data often cannot be taken off-premises or shared with large groups of annotators, making it difficult to curate the large labeled datasets needed to train AI systems.

Putting Snorkel Flow to Use for Healthcare

Snorkel Flow is an end-to-end ML platform that focuses on a new programmatic approach to labeling, building, and managing data, enabling a new level of rapid, iterative development and deployment of AI applications.Snorkel’s technology has been applied to several important problems in healthcare:

Snorkel Flow addresses practical and ethical challenges with AI application development and deployment:

  • Easier Subject Matter Expertise Collaboration: Snorkel Flow empowers subject matter experts to leverage their domain expertise to train and monitor AI systems through programmatic labeling via a no-code platform. Ultimately, doctors treat patients, and close collaboration with subject matter expertise is critical to building systems that improve care.
  • Faster Model Development: Snorkel Flow supports fast, iterative deployment of AI systems that is not blocked by the time required for hand labeling. For example, programmatic labeling reduces hand labeling time from months to days in a radiology application (Dunnmon et al. 2019).
  • Advanced Monitoring: Programmatic labeling through Snorkel Flow enables new ways of monitoring AI systems. By tracking labeling function performance, Snorkel Flow can help identify concept drift over time and generalization issues in deployment. 
  • Auditability at Every Step: The data and labels used to train AI systems in Snorkel Flow are auditable and interpretable. Users can analyze all sources of supervision used to train the model as well as outputs from any step in the pipeline.
  • Data Privacy: With Snorkel Flow, training data can be labeled programmatically in a fully eyes-off environment, and all data can be kept on-premises.

This post recaps the conversation between Brandon Yang, AI researcher an ML engineer at Snorkel AI, and Dr. Bahijja Raimi-Abraham from King College London from her podcast Monday Science to discuss the application of AI in healthcare.We love meeting people from the data science and machine learning community and appreciate every opportunity to share our work. If you want a Snorkeler to give a talk at your event, send us an email at info@snorkel.ai. Follow us on Twitter to see upcoming events by Snorkelers. Let’s put AI to work for all.