Abstract

This paper finds that only 13% of biomedical datasets are available via programmatic access and 30% lack documentation on licensing and permitted reuse, highlighting the dataset debt in biomedical NLP.