Science and Research

Biases in machine-learning models of human single-cell data

Recent machine-learning (ML)-based advances in single-cell data science have enabled the stratification of human tissue donors at single-cell resolution, promising to provide valuable diagnostic and prognostic insights. However, such insights are susceptible to biases. Here we discuss various biases that emerge along the pipeline of ML-based single-cell analysis, ranging from societal biases affecting whose samples are collected, to clinical and cohort biases that influence the generalizability of single-cell datasets, biases stemming from single-cell sequencing, ML biases specific to (weakly supervised or unsupervised) ML models trained on human single-cell samples and biases during the interpretation of results from ML models. We end by providing methods for single-cell data scientists to assess and mitigate biases, and call for efforts to address the root causes of biases.

  • Willem, T.
  • Shitov, V. A.
  • Luecken, M. D.
  • Kilbertus, N.
  • Bauer, S.
  • Piraud, M.
  • Buyx, A.
  • Theis, F. J.
Publication details
DOI: 10.1038/s41556-025-01619-8
Journal: Nat Cell Biol
Work Type: Review
Location: CPC-M
Disease Area: DPLD
Partner / Member: HMGU
Access-Number: 39972066

DZL Engagements

chevron-down