Science and Research

Sfaira accelerates data and model reuse in single cell genomics

Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.

  • Fischer, D. S.
  • Dony, L.
  • König, M.
  • Moeed, A.
  • Zappia, L.
  • Heumos, L.
  • Tritschler, S.
  • Holmberg, O.
  • Aliee, H.
  • Theis, F. J.

Keywords

  • Data zoo
  • Model zoo
  • Single-cell genomics
  • Inc., and ownership interest in Cellarity, Inc. and Dermagnostix.
Publication details
DOI: 10.1186/s13059-021-02452-6
Journal: Genome Biol
Pages: 248 
Number: 1
Work Type: Original
Location: CPC-M
Disease Area: PLB
Partner / Member: HMGU
Access-Number: 34433466

DZL Engagements

chevron-down