Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.
- Fischer, D. S.
- Dony, L.
- König, M.
- Moeed, A.
- Zappia, L.
- Heumos, L.
- Tritschler, S.
- Holmberg, O.
- Aliee, H.
- Theis, F. J.
Keywords
- Data zoo
- Model zoo
- Single-cell genomics
- Inc., and ownership interest in Cellarity, Inc. and Dermagnostix.