Online integration of atlas-level single-cell datasets

Stars PyPI Documentation Status Downloads

Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space



install from PyPI

pip install scalex

install from GitHub

git clone git://
cd scalex
python install

SCALEX is implemented in Pytorch framework.
SCALEX can be run on CPU devices, and running SCALEX on GPU devices if available is recommended.

Quick Start

SCALEX can both used under command line and API function in jupyter notebook

1. Command line

Standard usage --data_list data1 data2 dataN --batch_categories batch_name1 batch_name2 batch_nameN 

--data_list: data path of each batch of single-cell dataset, use -d for short

--batch_categories: name of each batch, batch_categories will range from 0 to N if not specified

Use h5ad file storing anndata as input, one or multiple files --datalist <filename.h5ad>

Specify batch in anadata.obs using --batch_name if only one concatenated h5ad file provided, batch_name can be e.g. conditions, samples, assays or patients, default is batch --datalist <filename.h5ad> --batch_name <specific_batch_name>

Integrate heterogenous scATAC-seq datasets, add option --profile ATAC --datalist <filename.h5ad> --profile ATAC

Inputation simultaneously along with Integration, add option --impute, results are stored at anndata.layers[‘impute’] --datalist <atac_filename.h5ad> --profile ATAC --impute

Custom features through --n_top_features a filename contains features in one column format read --datalist <filename.h5ad> --n_top_features features.txt



Output will be saved in the output folder including:

Useful options


Look for more usage of SCALEX --help 

2. API function

from scalex import SCALEX
adata = SCALEX(data_list, batch_categories)

Function of parameters are similar to command line options. Output is a Anndata object for further analysis with scanpy.