from torch.onnx._internal.fx._pass import Analysis

StaVia TI for Spatio-temporal Analysis of single cell data

This tutorial focuses on how to use StaVia on single cell atlases that have spatial and temporal data. We use a Stereo-seq Zebrafish gastrulation dataset ZESTA (C. Liu et al. 2022) from 5hpf to 24hpf of 150,000 spots approaching single cell size. The preprocessed anndata object with all slices and time points merged on shared genes is available here.

We show how to incorporate both spatial coordinates of cells on different tissue slices as well as temporal labels from known experimental/developmental time points

import scanpy as sc, numpy as np, pandas as pd, pyVIA.core as via

# download and read the data. 
foldername = '/home/user/Spatial/datasets/Zesta/'
fname_outer = 'Data/steroseq_alltimes_outerjoin_26628genes.h5ad'  # outer join of genes from each time point
fname = 'Data/stavia_spatialPCA100_spatialk15_weightp3.h5ad' #contains the PCs on the spatially smoothed genes. adata.X is the original non-smoothed gene expression, but the PCA is based on the spatially-smoothed gene matrix (which we dont save as its a big dense matrix)
print('fname:', fname)
adata = sc.read(foldername + fname)
#ad_outer = sc.read(foldername + fname_outer)
print(adata)
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/umap/distances.py:1063: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  @numba.jit()
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/umap/distances.py:1071: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  @numba.jit()
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/umap/distances.py:1086: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  @numba.jit()
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/umap/umap_.py:660: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  @numba.jit()
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/phate/__init__.py
fname: Data/stavia_spatialPCA100_spatialk15_weightp3.h5ad
AnnData object with n_obs × n_vars = 152977 × 13615
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'seurat_clusters', 'spatial_x', 'spatial_y', 'slice', 'time_point', 'bin_annotation', 'colors', 'layer_annotation', 'layer_colors', 'time_str', 'time_num', 'coarse_celltype', 'fine_celltype', 'coarse_fine_celltype', 'via_clusters', 'coarse_celltype_stavia', 'fine_celltype_stavia', 'coarse_fine_celltype_stavia'
    uns: 'pca'
    obsm: 'X_pca', 'X_spatial', 'X_spatial_adjusted', 'spatial_pca'
    varm: 'PCs'
    layers: 'counts'
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/anndata/_core/anndata.py:1830: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
  utils.warn_names_duplicates("obs")
true_labels_coarse = [i for i in adata.obs['coarse_celltype_stavia']]
true_labels_fine = [ i for i in adata.obs['fine_celltype_stavia']]
true_labels_hybrid = [i for i in adata.obs['coarse_fine_celltype_stavia']]

Plot the data

Let’s have a quick look at the data. We plot the cells/spots based on their tissue coordinates and color them by time point.

f,ax = via.plot_scatter(embedding=adata.obsm['X_spatial_adjusted'], labels=adata.obs['time_point'],                     title='All slices and time points', text_labels=True, show_legend=False)
f.set_size_inches(10, 5)
../_images/c4c446c64e668c1c3a04174ce3b36b0749caa397b6719749837755f66ecec1e7.png

Let’s run StaVia and see what the trajectories look like.

  • We demonstrate how to run StaVia under two slightly different settings here. 1) no precomupted cluster labels or user-defined terminal states and 2) Using pre-comupted cluster labels and passing on prior knowledge of terminal cell fates

Special Parameters for Spatial Data

  • Before running the TI, we modify the input gene-expression and resulting PCs. The genes are smoothed by their spatial neighbors prior to PCA. The PCs are then passed on to the TI computation. We recommend saving down the PCs computed on the spatially-adjusted genes to avoid having to recompute this each time (it takes a few mins on larger data)

  • do_spatial = True, spatial_knn = integer value, spatial_aux = [list of length n_cells of membership to which tissue slice], do_spatial_layout=Bool, spatial_coords= np.array(n_cells x 2dims)

  • if all the cells come from a single slice we can try to influence the layout of the graph and embedding based on the tissue coordinates by setting the do_spatial_layout=True. (As done for MERFISH data) However, for this dataset the cells come from different tissue slices with different relative orientations and distances, so we set do_spatial_layout=False

  • since the cells come from multiple distinct tissue slices with different coordinate axes, we also need to provide a unique identifier for each slice by providing an input list for spatial_aux = [list of length n_cells with unique IDs for each slice that the cell comes from]

  • in this tutorial we have left memory at its default value of memory =5, which means that it will be a gentle guiding factor in lineage pathways.

knn = 20  
root = ["deep blastomere"]  
time_series = True
do_spatial = True
n_pcs = 30
spatial_knn = 10 #(this is the spatial knn)

knn_sequential = 5  # 10, 15 also good #(this is the temporal knn)
t_diff_step = 2  
resolution_parameter = 1  # other values e.g 1.5 and 2 also work well depending on desired granularity
cluster_graph_pruning= 0.3 # decreasing this value will decrease the number of edges retained in the TI graphs
edgepruning_clustering_resolution = 0.3 # decreasing this number results in more and smaller clusters (values 0.1-1.5 can be tried)
random_seed = 2  
slices = [str(i) for i in adata.obs['slice']] #this are not unique, e.g. timepoint-1 has slices (1-10), and timepoint-2 has another set of slices (1-8), so the numbers of the slices do not uniquely identify the coordinates of tissues specific to that cell batch
time_numeric = [int(float(i[:-3])) for i in adata.obs['time_point']]
spatial_aux = [str(x) + '_' + str(y) for (x, y) in zip(time_numeric, slices)]  # unique slice_ID for each slice in the dataset as otherwise each time point has some overlapping Slice IDs

print(f'time_series ({time_series}) and spatial_info ({do_spatial})')
pre_labels = [i for i in adata.obs['via_clusters']]
use_prelabels = False  # True
if use_prelabels:
    # suppose you have a list of cell labels corresponding to a precomupted clustering, then you can use these as the clusters in the TI graph. You can also choose to pass on some predefined terminal states
    # we show how to run this later in the tutorial 
    true_label = pre_labels #set annotations to the cluster labels
    #a list of cell types/clusters that are found in true_label and are viable terminal cell fates
    user_defined_terminal_group=[26,46,10, 13, 17, 21, 24, 28, 30, 34, 36, 37, 39, 41, 43, 53, 61, 64, 65, 71,20,57,66,29,22] 
    labels = pre_labels #preset the cluster-graph groupings
else: 
    labels = None
    user_defined_terminal_group = []
    true_label=true_labels_fine
if do_spatial:
    #compute the spatially adjusted principle components. This takes a few minutes, so its suggested to save these PCs into your anndata object for subsequent runs. 
    do_spatial_gene_smoothing = False #use the precomputed PCs from smoothed-genes saved in the anndata object. Set to True if you wish to recompute these 
    if do_spatial_gene_smoothing:
        #start spatially adjusting genes and PCs
        spatial_weight = 0.3  # contribution of spatial-nn to gene-expression vs. cell's own gene-expression
        print('run spatial_input_slices')
        spatial_knn_gene_smooth = 10 #(typical values between 5-15)
        X_spatial_exp0 = via.spatial_input(X_genes=adata.X.todense(),
                                           spatial_coords=np.asarray(adata.obsm['X_spatial']),
                                           knn_spatial=spatial_knn_gene_smooth, spatial_weight=spatial_weight,
                                           spatial_slice_labels=spatial_aux)
    
        adata.obsm['X_gene_spatial_adjusted'] = X_spatial_exp0
        print('Start PCA on spatial_X')
        adata.obsm["spatial_pca"] = sc.tl.pca(adata.obsm['X_gene_spatial_adjusted'], n_comps=n_pcs)
        print('End PCA on spatial_X')
        del adata.obsm['X_gene_spatial_adjusted']  # too large (dense) to save down
        #end spatially adjusting genes and PCs
    
    input = adata.obsm['spatial_pca'][:, 0:n_pcs]
    coords = np.asarray(adata.obsm['X_spatial_adjusted'])
    print(f'spatial knn for graph construction {spatial_knn}')
    
else: 
    spatial_knn=None
    input = adata.obsm['X_pca'][:, 0:n_pcs]

v1 = via.VIA(data=input, true_label=true_label, 
                 edgepruning_clustering_resolution=edgepruning_clustering_resolution, labels=labels,
                 edgepruning_clustering_resolution_local=1, knn=knn, knn_sequential=knn_sequential,
                 knn_sequential_reverse=knn_sequential,
                 cluster_graph_pruning=cluster_graph_pruning,
                 neighboring_terminal_states_threshold=4,
                 too_big_factor=0.3, resolution_parameter=resolution_parameter,
                 root_user=root, dataset='group', random_seed=random_seed,
                 is_coarse=True, preserve_disconnected=True, pseudotime_threshold_TS=40, x_lazy=0.99,
                 t_diff_step=t_diff_step,
                 alpha_teleport=0.99, edgebundle_pruning_twice=False,
                 time_series_labels=time_numeric, time_series=time_series,
                spatial_aux=spatial_aux,
                 do_spatial_knn=do_spatial, do_spatial_layout=False, spatial_coords=coords, spatial_knn=spatial_knn)
v1.run_VIA()
time_series (True) and spatial_info (True)
spatial knn for graph construction 10
2024-05-09 08:20:47.340332	Running VIA over input data of 152977 (samples) x 30 (features)
2024-05-09 08:20:47.340378	Knngraph has 20 neighbors
2024-05-09 08:21:22.015993	Using time series information to guide knn graph construction 
2024-05-09 08:21:22.022478	Time series ordered set [3, 5, 10, 12, 18, 24]
2024-05-09 08:22:18.115511	Shape neighbors (152977, 20) and sequential neighbors (152977, 10)
2024-05-09 08:22:18.141526	Shape augmented neighbors (152977, 30)
2024-05-09 08:22:18.160457	Actual average allowable time difference between nodes is 8.4
2024-05-09 08:22:24.706253	Using spatial information to guide knn graph construction + combining with time-series data
2024-05-09 08:22:24.714047	These slices are present: ['10_1', '10_10', '10_11', '10_12', '10_13', '10_14', '10_15', '10_16', '10_17', '10_18', '10_19', '10_2', '10_20', '10_21', '10_22', '10_23', '10_24', '10_25', '10_26', '10_3', '10_4', '10_5', '10_6', '10_7', '10_8', '10_9', '12_1', '12_10', '12_11', '12_12', '12_2', '12_3', '12_4', '12_5', '12_6', '12_7', '12_8', '12_9', '18_1', '18_10', '18_11', '18_12', '18_13', '18_14', '18_2', '18_3', '18_4', '18_5', '18_6', '18_7', '18_8', '18_9', '24_1', '24_10', '24_11', '24_12', '24_13', '24_14', '24_15', '24_16', '24_17', '24_2', '24_3', '24_4', '24_5', '24_6', '24_7', '24_8', '24_9', '3_1', '3_10', '3_2', '3_3', '3_4', '3_5', '3_6', '3_7', '3_8', '3_9', '5_1', '5_10', '5_11', '5_12', '5_2', '5_3', '5_4', '5_5', '5_6', '5_7', '5_8', '5_9']
2024-05-09 08:22:58.183949	Shape neighbors (152977, 30) and spatial neighbors (152977, 10)
2024-05-09 08:22:58.211524	Shape of spatially augmented neighbors (152977, 40)
2024-05-09 08:23:18.152109	Finished global pruning of 20-knn graph used for clustering at level of 0.3. Kept 52.2 % of edges. 
2024-05-09 08:23:18.266924	Number of connected components used for clustergraph  is 1
2024-05-09 08:23:38.787784	Commencing community detection
2024-05-09 08:23:43.958023	Finished community detection. Found 2857 clusters.
2024-05-09 08:23:44.097108	Merging 2801 very small clusters (<10)
2024-05-09 08:23:44.205236	Finished detecting communities. Found 56 communities
2024-05-09 08:23:44.244772	Making cluster graph. Global cluster graph pruning level: 0.3
2024-05-09 08:23:45.084256	Graph has 1 connected components before pruning
2024-05-09 08:23:45.090358	Graph has 1 connected components after pruning
2024-05-09 08:23:45.090571	Graph has 1 connected components after reconnecting
2024-05-09 08:23:45.091645	14.3% links trimmed from local pruning relative to start
2024-05-09 08:23:45.091674	60.9% links trimmed from global pruning relative to start
initial links 1676 and final_links_n 1437
2024-05-09 08:23:45.181864	component number 0 out of  [0]
2024-05-09 08:23:45.499642	group root method
2024-05-09 08:23:45.499730	for component 0, the root is deep blastomere and ri deep blastomere
cluster 0 has majority aSegPlate, TailBud
cluster 1 has majority deep blastomere
2024-05-09 08:23:46.493492	New root is 1 and majority deep blastomere
cluster 2 has majority Anterior Neural Keel
cluster 3 has majority aSegPlate, TailBud
cluster 4 has majority EVL
cluster 5 has majority Neural Crest
cluster 6 has majority Nervous System
cluster 7 has majority dPronephros
cluster 8 has majority Neural Crest
cluster 9 has majority Yolk Syncytial Layer
cluster 10 has majority Nervous System
cluster 11 has majority Nervous System
cluster 12 has majority Nervous System
cluster 13 has majority Nervous System
cluster 14 has majority Pigment Cell
cluster 15 has majority eMuscle_Fast
cluster 16 has majority Neural Rod
cluster 17 has majority Neural Rod
cluster 18 has majority Anterior Erythroid Lineage Cell
cluster 19 has majority eMuscle
cluster 20 has majority cNotochord
cluster 21 has majority Yolk Syncytial Layer, Periderm
cluster 22 has majority Epidermis
cluster 23 has majority Musculature System, Yolk Syncytial Layer
cluster 24 has majority Nervous System
cluster 25 has majority bSomite
cluster 26 has majority Yolk Syncytial Layer
cluster 27 has majority Anterior Erythroid Lineage Cell
cluster 28 has majority eMuscle_Slow
cluster 29 has majority Spinal Cord Dorsal Region
cluster 30 has majority Nervous System
cluster 31 has majority Yolk Syncytial Layer
cluster 32 has majority Spinal Cord Ventral Region
cluster 33 has majority Yolk Syncytial Layer, Periderm
cluster 34 has majority cNotochord
cluster 35 has majority Immature Eye, Midbrain
cluster 36 has majority dPronephros
cluster 37 has majority eMuscle
cluster 38 has majority Nervous System
cluster 39 has majority Epidermis
cluster 40 has majority eMuscle_Slow
cluster 41 has majority Anterior Erythroid Lineage Cell
cluster 42 has majority Musculature System, Yolk Syncytial Layer
cluster 43 has majority Forebrain
cluster 44 has majority Yolk Syncytial Layer
cluster 45 has majority cNotochord
cluster 46 has majority Posterior Erythroid Lineage Cell
cluster 47 has majority Eye
cluster 48 has majority Anterior Erythroid Lineage Cell
cluster 49 has majority cNotochord
cluster 50 has majority eMuscle
cluster 51 has majority Nervous System
cluster 52 has majority eMuscle
cluster 53 has majority aSegPlate, TailBud
cluster 54 has majority Pigment Cell
cluster 55 has majority Periderm
2024-05-09 08:23:47.287832	Computing lazy-teleporting expected hitting times
2024-05-09 08:23:48.978651	Ended all multiprocesses, will retrieve and reshape
2024-05-09 08:23:49.095225	start computing walks with rw2 method
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/pecanpy/graph.py:90: UserWarning: WARNING: Implicitly set node IDs to the canonical node ordering due to missing IDs field in the raw CSR npz file. This warning message can be suppressed by setting implicit_ids to True in the read_npz function call, or by setting the --implicit_ids flag in the CLI
  warnings.warn(
memory for rw2 hittings times  2. Using rw2 based pt
2024-05-09 08:23:56.194673	Identifying terminal clusters corresponding to unique lineages...
2024-05-09 08:23:56.194709	Closeness:[1, 2, 3, 4, 9, 14, 15, 16, 19, 20, 22, 28, 29, 31, 33, 37, 38, 39, 44, 50, 52, 55]
2024-05-09 08:23:56.194727	Betweenness:[0, 1, 2, 4, 7, 9, 10, 12, 13, 14, 15, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, 46, 50, 53]
2024-05-09 08:23:56.194742	Out Degree:[0, 1, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 52, 53, 54]
2024-05-09 08:23:56.195107	Cluster 7 had 4 or more neighboring terminal states [15, 20, 23, 28, 37, 41, 42, 45, 46] and so we removed cluster 45
2024-05-09 08:23:56.195234	Cluster 12 had 4 or more neighboring terminal states [10, 13, 14, 30, 34, 38] and so we removed cluster 30
2024-05-09 08:23:56.195292	Cluster 13 had 4 or more neighboring terminal states [10, 12, 20, 38] and so we removed cluster 12
2024-05-09 08:23:56.195408	Cluster 15 had 4 or more neighboring terminal states [7, 20, 28, 37, 46, 50, 52] and so we removed cluster 46
2024-05-09 08:23:56.195471	Cluster 19 had 4 or more neighboring terminal states [22, 26, 28, 29, 36, 37, 39, 40, 42, 50, 52] and so we removed cluster 36
2024-05-09 08:23:56.195519	Cluster 20 had 4 or more neighboring terminal states [7, 13, 15, 23] and so we removed cluster 23
2024-05-09 08:23:56.195567	Cluster 22 had 4 or more neighboring terminal states [19, 29, 37, 39, 40] and so we removed cluster 40
2024-05-09 08:23:56.195662	Cluster 26 had 4 or more neighboring terminal states [14, 19, 29, 31, 39, 41, 42, 52] and so we removed cluster 31
2024-05-09 08:23:56.195703	Cluster 28 had 4 or more neighboring terminal states [7, 15, 19, 50, 52] and so we removed cluster 7
2024-05-09 08:23:56.195768	Cluster 29 had 4 or more neighboring terminal states [19, 22, 26, 37, 52] and so we removed cluster 26
2024-05-09 08:23:56.195974	Cluster 37 had 4 or more neighboring terminal states [15, 19, 22, 29, 42, 50, 52] and so we removed cluster 42
2024-05-09 08:23:56.196245	Cluster 50 had 4 or more neighboring terminal states [15, 19, 28, 37] and so we removed cluster 19
2024-05-09 08:23:56.196290	Cluster 52 had 4 or more neighboring terminal states [15, 28, 29, 37, 39] and so we removed cluster 37
2024-05-09 08:23:56.196546	Terminal clusters corresponding to unique lineages in this component are [10, 13, 14, 15, 20, 22, 28, 29, 38, 39, 41, 50, 52] 
2024-05-09 08:23:56.196568	Calculating lineage probability at memory 5
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/pecanpy/graph.py:90: UserWarning: WARNING: Implicitly set node IDs to the canonical node ordering due to missing IDs field in the raw CSR npz file. This warning message can be suppressed by setting implicit_ids to True in the read_npz function call, or by setting the --implicit_ids flag in the CLI
  warnings.warn(
2024-05-09 08:24:02.322951	Cluster or terminal cell fate 10 is reached 63.0 times
2024-05-09 08:24:02.503376	Cluster or terminal cell fate 13 is reached 147.0 times
2024-05-09 08:24:02.687063	Cluster or terminal cell fate 14 is reached 143.0 times
2024-05-09 08:24:02.770889	Cluster or terminal cell fate 15 is reached 1000.0 times
2024-05-09 08:24:02.926636	Cluster or terminal cell fate 20 is reached 612.0 times
2024-05-09 08:24:03.109523	Cluster or terminal cell fate 22 is reached 460.0 times
2024-05-09 08:24:03.182370	Cluster or terminal cell fate 28 is reached 1000.0 times
2024-05-09 08:24:03.279918	Cluster or terminal cell fate 29 is reached 563.0 times
2024-05-09 08:24:03.451501	Cluster or terminal cell fate 38 is reached 161.0 times
2024-05-09 08:24:03.587596	Cluster or terminal cell fate 39 is reached 705.0 times
2024-05-09 08:24:03.711198	Cluster or terminal cell fate 41 is reached 151.0 times
2024-05-09 08:24:03.761113	Cluster or terminal cell fate 50 is reached 1000.0 times
2024-05-09 08:24:03.864036	Cluster or terminal cell fate 52 is reached 947.0 times
2024-05-09 08:24:04.459490	There are (13) terminal clusters corresponding to unique lineages {10: 'Nervous System', 13: 'Nervous System', 14: 'Pigment Cell', 15: 'eMuscle_Fast', 20: 'cNotochord', 22: 'Epidermis', 28: 'eMuscle_Slow', 29: 'Spinal Cord Dorsal Region', 38: 'Nervous System', 39: 'Epidermis', 41: 'Anterior Erythroid Lineage Cell', 50: 'eMuscle', 52: 'eMuscle'}
2024-05-09 08:24:04.459587	Begin projection of pseudotime and lineage likelihood
2024-05-09 08:24:11.455059	Start reading data
2024-05-09 08:24:11.455136	Correlation of Via pseudotime with developmental stage 91.63 %
2024-05-09 08:24:11.492289	Cluster graph layout based on forward biasing
2024-05-09 08:24:11.499638	Starting make edgebundle viagraph...
2024-05-09 08:24:11.499669	Make via clustergraph edgebundle
2024-05-09 08:24:12.444092	Hammer dims: Nodes shape: (56, 2) Edges shape: (656, 3)
2024-05-09 08:24:12.445540	Graph has 1 connected components before pruning
2024-05-09 08:24:12.448280	Graph has 17 connected components after pruning
2024-05-09 08:24:12.459777	Graph has 1 connected components after reconnecting
2024-05-09 08:24:12.460311	83.1% links trimmed from local pruning relative to start
2024-05-09 08:24:12.460332	87.5% links trimmed from global pruning relative to start
initial links 656 and final_links_n 111
2024-05-09 08:24:12.642341	Time elapsed 176.7 seconds
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/scipy/sparse/_index.py:103: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  self._set_intXint(row, col, x.flat[0])

We can plot the resulting trajectory cluster graph to show the major trajectories and cell clusters colored by discrete/categorical values (e.g. stage, cell type) or by continuous values (pseudotime, gene expression)

print(v1.terminal_clusters)
plot_viagraph_categorical = True

if plot_viagraph_categorical:
    #plotting viagraph colored by categorical labels (e.g. stages, cell types)
    f1, ax1 =via.plot_piechart_only_viagraph(via_object=v1, cmap_piechart='plasma',
                                                  pie_size_scale=0.4, linewidth_edge=0.6, ax_text=False,
                                                  reference_labels=time_numeric, headwidth_arrow=0.2,
                                                  highlight_terminal_clusters=True, show_legend=False)
    f1.set_size_inches(4,3)
    f2, ax2 = via.plot_piechart_only_viagraph(via_object=v1, cmap_piechart='rainbow',
                                                  pie_size_scale=0.4, linewidth_edge=0.6, ax_text=True,
                                                  reference_labels=true_labels_hybrid, headwidth_arrow=0.2,
                                                  highlight_terminal_clusters=True, show_legend=False, fontsize=4)
    f2.set_size_inches(4, 3)

plot_viagraph_continuous = True

if plot_viagraph_continuous:
    # plotting the viagraph colored by pseudotime, default edges
    # via.plot_viagraph(via_object=v0, tune_edges=False)
    # plotting the viagraph colored by pseudotime, tuning visualized edges
    # via.plot_viagraph(via_object=v0, tune_edges=True, cmap='viridis', initial_bandwidth=0.05, decay=0.7, edgebundle_pruning=0.5)

    f1, ax1 = via.plot_viagraph(via_object=v1, cmap='plasma',
                                              edgeweight_scale=1.5, label_text=False)
                                              
                                             
    f1.set_size_inches(4, 3)
    f1.suptitle('dev stage' + '_doSpat' + str(do_spatial))
    genes = ['klhl41b', 'mylpfb', 'actc1b'] #muscle related genes.
    df_genes_sc = pd.DataFrame(adata[:, genes].X.todense(), columns=genes)
    f2, ax1 = via.plot_viagraph(via_object=v1, df_genes=df_genes_sc, gene_list=genes,
                                              edgeweight_scale=1, label_text=True    )
    f2.set_size_inches(30, 8)
    
[10, 13, 14, 15, 20, 22, 28, 29, 38, 39, 41, 50, 52]
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/pyVIA/plotting_via.py:3513: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  sct = ax.scatter(node_pos[:, 0], node_pos[:, 1],
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/pyVIA/plotting_via.py:3513: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  sct = ax.scatter(node_pos[:, 0], node_pos[:, 1],
../_images/00c8c3352774e5054a3a0b09c2e6474d34055b08259bf9d50aa26deaf370a79b.png ../_images/06f69fb3f77143911e2478635c8a10537b65220819528dacea0741349af31b0e.png ../_images/17bb093e5773979f468da7cd75ba07adfdd63484d92d64bf7fe4781feecb2c83.png ../_images/9676dbdd95d7b6ba7297a4ea4e2a427c600e5aa7073b68fa98ad9e2820b55171.png

Running StaVia with prior clusters and terminal states

Let’s say you are very happy with the clusters you have computed based on previous run using StaVia, or you would like to use a subset of the terminal states it has identified, or even modify the list of terminal states and provide some more user defined guidance. In StaVia you can pass these on as input parameters to fix the clusters and terminal states in the following manner

  • the user definted terminal states can be provided as group level fates that correspond to the labels found in the “true_label” parameter OR can be given as a list of single-cell indices as user_defined_terminal_cells = [‘list of numeric indices of cells’].

  • in the example below we use clusters that were identified in prior runs of StaVia as terminal cell fatesm as the user_defined_terminal_group. We therefore need to ensure that these labels exist in the true_label parameter

pre_labels = [int(i) for i in adata.obs['via_clusters']]
v0 = via.VIA(data=input, true_label=pre_labels,
             edgepruning_clustering_resolution=edgepruning_clustering_resolution, labels=pre_labels,
             edgepruning_clustering_resolution_local=1, knn=knn, knn_sequential=knn_sequential,
             knn_sequential_reverse=knn_sequential,
             cluster_graph_pruning=cluster_graph_pruning,resolution_parameter=resolution_parameter,
             root_user=root, random_seed=random_seed,user_defined_terminal_group=[26,46,10, 13, 17, 21, 24, 28, 30, 34, 36, 37, 39, 41, 43, 53, 61, 64, 65, 71,20,57,66,29,22] ,
             t_diff_step=t_diff_step,edgebundle_pruning=cluster_graph_pruning, edgebundle_pruning_twice=False,
             time_series_labels=time_numeric, time_series=time_series,
             spatial_aux=spatial_aux,
             do_spatial_knn=do_spatial, do_spatial_layout=False, spatial_coords=coords, spatial_knn=spatial_knn)

v0.run_VIA()
2024-05-08 10:05:00.530513	Running VIA over input data of 152977 (samples) x 30 (features)
2024-05-08 10:05:00.530596	Knngraph has 20 neighbors
2024-05-08 10:05:32.807649	Using time series information to guide knn graph construction 
2024-05-08 10:05:32.810528	Time series ordered set [3, 5, 10, 12, 18, 24]
2024-05-08 10:06:26.467705	Shape neighbors (152977, 20) and sequential neighbors (152977, 10)
2024-05-08 10:06:26.503122	Shape augmented neighbors (152977, 30)
2024-05-08 10:06:26.527353	Actual average allowable time difference between nodes is 8.4
2024-05-08 10:06:35.594145	Using spatial information to guide knn graph construction + combining with time-series data
2024-05-08 10:06:35.597896	These slices are present: ['10_1', '10_10', '10_11', '10_12', '10_13', '10_14', '10_15', '10_16', '10_17', '10_18', '10_19', '10_2', '10_20', '10_21', '10_22', '10_23', '10_24', '10_25', '10_26', '10_3', '10_4', '10_5', '10_6', '10_7', '10_8', '10_9', '12_1', '12_10', '12_11', '12_12', '12_2', '12_3', '12_4', '12_5', '12_6', '12_7', '12_8', '12_9', '18_1', '18_10', '18_11', '18_12', '18_13', '18_14', '18_2', '18_3', '18_4', '18_5', '18_6', '18_7', '18_8', '18_9', '24_1', '24_10', '24_11', '24_12', '24_13', '24_14', '24_15', '24_16', '24_17', '24_2', '24_3', '24_4', '24_5', '24_6', '24_7', '24_8', '24_9', '3_1', '3_10', '3_2', '3_3', '3_4', '3_5', '3_6', '3_7', '3_8', '3_9', '5_1', '5_10', '5_11', '5_12', '5_2', '5_3', '5_4', '5_5', '5_6', '5_7', '5_8', '5_9']
2024-05-08 10:07:10.143200	Shape neighbors (152977, 30) and spatial neighbors (152977, 10)
2024-05-08 10:07:10.184010	Shape of spatially augmented neighbors (152977, 40)
2024-05-08 10:07:29.258824	Finished global pruning of 20-knn graph used for clustering at level of 0.3. Kept 52.2 % of edges. 
2024-05-08 10:07:29.375279	Number of connected components used for clustergraph  is 1
2024-05-08 10:07:48.781778	Using predfined labels provided by user (this must be provided as an array)
2024-05-08 10:07:48.781870	Making cluster graph. Global cluster graph pruning level: 2
2024-05-08 10:07:49.716909	Graph has 1 connected components before pruning
2024-05-08 10:07:49.724869	Graph has 1 connected components after pruning
2024-05-08 10:07:49.725137	Graph has 1 connected components after reconnecting
2024-05-08 10:07:49.726784	23.5% links trimmed from local pruning relative to start
2024-05-08 10:07:49.822596	component number 0 out of  [0]
2024-05-08 10:07:50.018752	group root method
2024-05-08 10:07:50.022207	setting a dummy root
cluster 0 has majority 0
cluster 1 has majority 1
2024-05-08 10:07:50.918137	New root is 1 and majority 1
cluster 2 has majority 2
cluster 3 has majority 3
cluster 4 has majority 4
cluster 5 has majority 5
cluster 6 has majority 6
cluster 7 has majority 7
cluster 8 has majority 8
cluster 9 has majority 9
cluster 10 has majority 10
cluster 11 has majority 11
cluster 12 has majority 12
cluster 13 has majority 13
cluster 14 has majority 14
cluster 15 has majority 15
cluster 16 has majority 16
cluster 17 has majority 17
cluster 18 has majority 18
cluster 19 has majority 19
cluster 20 has majority 20
cluster 21 has majority 21
cluster 22 has majority 22
cluster 23 has majority 23
cluster 24 has majority 24
cluster 25 has majority 25
cluster 26 has majority 26
cluster 27 has majority 27
cluster 28 has majority 28
cluster 29 has majority 29
cluster 30 has majority 30
cluster 31 has majority 31
cluster 32 has majority 32
cluster 33 has majority 33
cluster 34 has majority 34
cluster 35 has majority 35
cluster 36 has majority 36
cluster 37 has majority 37
cluster 38 has majority 38
cluster 39 has majority 39
cluster 40 has majority 40
cluster 41 has majority 41
cluster 42 has majority 42
cluster 43 has majority 43
cluster 44 has majority 44
cluster 45 has majority 45
cluster 46 has majority 46
cluster 47 has majority 47
cluster 48 has majority 48
cluster 49 has majority 49
cluster 50 has majority 50
cluster 51 has majority 51
cluster 52 has majority 52
cluster 53 has majority 53
cluster 54 has majority 54
cluster 55 has majority 55
cluster 56 has majority 56
cluster 57 has majority 57
cluster 58 has majority 58
cluster 59 has majority 59
cluster 60 has majority 60
cluster 61 has majority 61
cluster 62 has majority 62
cluster 63 has majority 63
cluster 64 has majority 64
cluster 65 has majority 65
cluster 66 has majority 66
cluster 67 has majority 67
cluster 68 has majority 68
cluster 69 has majority 69
cluster 70 has majority 70
cluster 71 has majority 71
cluster 72 has majority 72
cluster 73 has majority 73
cluster 74 has majority 74
2024-05-08 10:07:51.835738	Computing lazy-teleporting expected hitting times
2024-05-08 10:07:54.105369	Ended all multiprocesses, will retrieve and reshape
2024-05-08 10:07:54.201044	start computing walks with rw2 method
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/pecanpy/graph.py:90: UserWarning: WARNING: Implicitly set node IDs to the canonical node ordering due to missing IDs field in the raw CSR npz file. This warning message can be suppressed by setting implicit_ids to True in the read_npz function call, or by setting the --implicit_ids flag in the CLI
  warnings.warn(
memory for rw2 hittings times  2. Using rw2 based pt
2024-05-08 10:08:00.280761	Terminal cluster list based on user defined cells/groups: [(26, 26), (46, 46), (10, 10), (13, 13), (17, 17), (21, 21), (24, 24), (28, 28), (30, 30), (34, 34), (36, 36), (37, 37), (39, 39), (41, 41), (43, 43), (53, 53), (61, 61), (64, 64), (65, 65), (71, 71), (20, 20), (57, 57), (66, 66), (29, 29), (22, 22)]
2024-05-08 10:08:00.281035	Terminal clusters corresponding to unique lineages in this component are [26, 46, 10, 13, 17, 21, 24, 28, 30, 34, 36, 37, 39, 41, 43, 53, 61, 64, 65, 71, 20, 57, 66, 29, 22] 
2024-05-08 10:08:00.281060	Calculating lineage probability at memory 5
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/pecanpy/graph.py:90: UserWarning: WARNING: Implicitly set node IDs to the canonical node ordering due to missing IDs field in the raw CSR npz file. This warning message can be suppressed by setting implicit_ids to True in the read_npz function call, or by setting the --implicit_ids flag in the CLI
  warnings.warn(
2024-05-08 10:08:06.638440	Cluster or terminal cell fate 26 is reached 576.0 times
2024-05-08 10:08:06.719356	Cluster or terminal cell fate 46 is reached 924.0 times
2024-05-08 10:08:06.862167	Cluster or terminal cell fate 10 is reached 99.0 times
2024-05-08 10:08:07.005036	Cluster or terminal cell fate 13 is reached 670.0 times
2024-05-08 10:08:07.239039	Cluster or terminal cell fate 17 is reached 239.0 times
2024-05-08 10:08:07.344238	Cluster or terminal cell fate 21 is reached 442.0 times
2024-05-08 10:08:07.438323	Cluster or terminal cell fate 24 is reached 883.0 times
2024-05-08 10:08:07.560577	Cluster or terminal cell fate 28 is reached 918.0 times
2024-05-08 10:08:07.721966	Cluster or terminal cell fate 30 is reached 770.0 times
2024-05-08 10:08:07.833380	Cluster or terminal cell fate 34 is reached 966.0 times
2024-05-08 10:08:07.924998	Cluster or terminal cell fate 36 is reached 898.0 times
2024-05-08 10:08:08.098273	Cluster or terminal cell fate 37 is reached 626.0 times
2024-05-08 10:08:08.194796	Cluster or terminal cell fate 39 is reached 996.0 times
2024-05-08 10:08:08.261348	Cluster or terminal cell fate 41 is reached 1000.0 times
2024-05-08 10:08:08.375337	Cluster or terminal cell fate 43 is reached 772.0 times
2024-05-08 10:08:08.437771	Cluster or terminal cell fate 53 is reached 910.0 times
2024-05-08 10:08:08.548070	Cluster or terminal cell fate 61 is reached 189.0 times
2024-05-08 10:08:08.662944	Cluster or terminal cell fate 64 is reached 105.0 times
2024-05-08 10:08:08.755156	Cluster or terminal cell fate 65 is reached 874.0 times
2024-05-08 10:08:08.847612	Cluster or terminal cell fate 71 is reached 994.0 times
2024-05-08 10:08:09.059856	Cluster or terminal cell fate 20 is reached 169.0 times
2024-05-08 10:08:09.296316	Cluster or terminal cell fate 57 is reached 33.0 times
2024-05-08 10:08:09.522780	Cluster or terminal cell fate 66 is reached 50.0 times
2024-05-08 10:08:09.633203	Cluster or terminal cell fate 29 is reached 14.0 times
2024-05-08 10:08:09.806414	Cluster or terminal cell fate 22 is reached 85.0 times
2024-05-08 10:08:10.129979	There are (25) terminal clusters corresponding to unique lineages {26: 26, 46: 46, 10: 10, 13: 13, 17: 17, 21: 21, 24: 24, 28: 28, 30: 30, 34: 34, 36: 36, 37: 37, 39: 39, 41: 41, 43: 43, 53: 53, 61: 61, 64: 64, 65: 65, 71: 71, 20: 20, 57: 57, 66: 66, 29: 29, 22: 22}
2024-05-08 10:08:10.130078	Begin projection of pseudotime and lineage likelihood
2024-05-08 10:08:16.836670	Start reading data
2024-05-08 10:08:16.836751	Correlation of Via pseudotime with developmental stage 94.28 %
2024-05-08 10:08:16.900080	Cluster graph layout based on forward biasing
2024-05-08 10:08:16.912144	Starting make edgebundle viagraph...
2024-05-08 10:08:16.912174	Make via clustergraph edgebundle
2024-05-08 10:08:19.932121	Hammer dims: Nodes shape: (75, 2) Edges shape: (2376, 3)
2024-05-08 10:08:19.933349	Graph has 1 connected components before pruning
2024-05-08 10:08:19.936345	Graph has 27 connected components after pruning
2024-05-08 10:08:19.960180	Graph has 1 connected components after reconnecting
2024-05-08 10:08:19.960785	92.6% links trimmed from local pruning relative to start
2024-05-08 10:08:19.960806	94.1% links trimmed from global pruning relative to start
initial links 2017 and final_links_n 150
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/scipy/sparse/_index.py:103: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  self._set_intXint(row, col, x.flat[0])
2024-05-08 10:08:20.343423	Time elapsed 172.2 seconds
plot_viagraph_categorical = True

if plot_viagraph_categorical:
    #plotting viagraph colored by categorical labels (e.g. stages, cell types)
    f1, ax1 =via.plot_piechart_only_viagraph(via_object=v0, cmap_piechart='plasma',
                                                  pie_size_scale=0.4, linewidth_edge=0.6, ax_text=True, fontsize=4,
                                                  reference_labels=time_numeric, headwidth_arrow=0.2,
                                                  highlight_terminal_clusters=True, show_legend=False)
    f1.set_size_inches(5,5)
    
    f2, ax2 = via.plot_piechart_only_viagraph(via_object=v0, cmap_piechart='rainbow',
                                                  pie_size_scale=0.4, linewidth_edge=0.6, ax_text=True,
                                                  reference_labels=true_labels_hybrid, headwidth_arrow=0.2,
                                                  highlight_terminal_clusters=True, show_legend=False, fontsize=2)
    f2.set_size_inches(5,5)
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/pyVIA/plotting_via.py:3513: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  sct = ax.scatter(node_pos[:, 0], node_pos[:, 1],
/home/user/anaconda3/envs/Via2Env_py10/lib/python3.10/site-packages/pyVIA/plotting_via.py:3513: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  sct = ax.scatter(node_pos[:, 0], node_pos[:, 1],
../_images/27c5a3749dd2a6814adf0b53f383aad2048f2fca47365cf48fb7f3b9ec8f05e1.png ../_images/67413d260f44a924a3cc7a2fdca2332d3cbf151a0f7c68697ca09d49e05859b4.png

Plotting clusters on tissue

  • Option 1: You can plot all the clusters (the plots will be generated by cell type and group clusters in the same celltype into a figure with subplots)

  • Option 2: Plot selected cell types or clusters only

  • reference_labels is an optional list of single-cell labels (e.g. time, annotation). This will plot (in a light background gray) all cells in the data that belong to the majority reference type of this cluster. ALso used in the title of each subplot to note the majority cell (ref2) type for each cluster

  • reference_labels2 is optional list of single-cell labels (e.g. time, annotation). this will be used in the title of each subplot to note the majority cell (ref2) type for each cluster`

'''
# Option 1
list_of_figs = via.plot_all_spatial_clusters(spatial_coords=coords, true_label=true_labels_coarse,
                                             via_labels=pre_labels, alpha=0.5, s=1, verbose=True, reference_labels=time_numeric, reference_labels2=true_labels_fine)
'''
# Option 2
ysl_clus = [[11, 6, 55, 32],'royalblue'] #YSL
meso_clus = [[4, 63, 3, 45, 56, 57, 41, 50, 58] , 'red'] #eryt+muscle
noto_clus = [[57, 20], 'orange']
ns_clus = [[0, 18, 31, 7, 52, 14, 8] , 'turqoise'] #eye+ns
nc_clus = [[5, 12, 22, 23] ,'deepskyblue'] #neural crest+pigment

celltype_to_plot = ysl_clus
f, axs = via.plot_clusters_spatial(spatial_coords=coords, via_labels=pre_labels,  clusters=celltype_to_plot[0], color=celltype_to_plot[1],
                          reference_labels=time_numeric, reference_labels2=true_labels_fine, s=1 )
f.set_size_inches(10,3)
for ax in axs:
    ax.title.set_size(6)
df_coords_majref  shape (19102, 2)
df_coords_majref  shape (16733, 2)
df_coords_majref  shape (30848, 2)
df_coords_majref  shape (72175, 2)
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
../_images/41e4922dcb14df5812ff6cf88df276486a8025fd15b0bb87203fbe6201e9e771.png

Plot the lineage paths on the tissue slices

Let’s have a look at what the recovered single-cell lineage probabilities towards different cell fates actually look like on the tissue slices. This gives us an idea of which cells and what locations of the tissue/embryo are actually involved or active in the emergence of different fates.

v0.terminal_clusters
print('plot lineage prob')
for ts_i in [43,13,17,36]:  
    via.plot_sc_lineage_probability(via_object=v0, embedding=coords, marker_lineages=[ts_i],
                                    scatter_size=1)
plot lineage prob
2024-05-09 09:16:06.485576	Marker_lineages: [43]
2024-05-09 09:16:06.614552	The number of components in the original full graph is 1
2024-05-09 09:16:06.614597	For downstream visualization purposes we are also constructing a low knn-graph 
2024-05-09 09:24:36.018370	Check sc pb 1.0 
f getting majority comp
2024-05-09 09:24:40.901534	Cluster path on clustergraph starting from Root Cluster 1 to Terminal Cluster 43: [1, 2, 4, 0, 18, 7, 66, 59, 57, 45, 35, 40, 43]
setting vmin to 0.03582004258667322
2024-05-09 09:24:41.162092	Revised Cluster level path on sc-knnGraph from Root Cluster 1 to Terminal Cluster 43 along path: [1, 1, 1, 2, 68, 3, 6, 25, 21, 9, 43, 43, 43, 43]
2024-05-09 09:24:41.183995	Marker_lineages: [13]
2024-05-09 09:24:41.289809	The number of components in the original full graph is 1
2024-05-09 09:24:41.289853	For downstream visualization purposes we are also constructing a low knn-graph 
2024-05-09 09:33:58.441712	Check sc pb 1.0 
f getting majority comp
2024-05-09 09:34:03.443880	Cluster path on clustergraph starting from Root Cluster 1 to Terminal Cluster 13: [1, 2, 4, 0, 18, 7, 66, 22, 5, 12, 25, 21, 32, 13]
setting vmin to 0.014884236260470122
2024-05-09 09:34:03.675343	Revised Cluster level path on sc-knnGraph from Root Cluster 1 to Terminal Cluster 13 along path: [1, 1, 1, 2, 68, 4, 3, 16, 7, 57, 20, 20, 20]
2024-05-09 09:34:03.711899	Marker_lineages: [17]
2024-05-09 09:34:03.814841	The number of components in the original full graph is 1
2024-05-09 09:34:03.814885	For downstream visualization purposes we are also constructing a low knn-graph 
2024-05-09 09:42:39.003275	Check sc pb 1.0 
f getting majority comp
2024-05-09 09:42:43.696719	Cluster path on clustergraph starting from Root Cluster 1 to Terminal Cluster 17: [1, 2, 4, 0, 18, 7, 15, 48, 64, 42, 17]
setting vmin to 0.012426956085510874
2024-05-09 09:42:43.953825	Revised Cluster level path on sc-knnGraph from Root Cluster 1 to Terminal Cluster 17 along path: [1, 1, 1, 2, 68, 3, 16, 25, 50, 51, 35, 35]
2024-05-09 09:42:43.989384	Marker_lineages: [36]
2024-05-09 09:42:44.091374	The number of components in the original full graph is 1
2024-05-09 09:42:44.091418	For downstream visualization purposes we are also constructing a low knn-graph 
2024-05-09 09:51:17.109709	Check sc pb 1.0 
f getting majority comp
2024-05-09 09:51:21.724183	Cluster path on clustergraph starting from Root Cluster 1 to Terminal Cluster 36: [1, 2, 4, 0, 18, 7, 66, 22, 36]
setting vmin to 0.047228886519191014
2024-05-09 09:51:21.869493	Revised Cluster level path on sc-knnGraph from Root Cluster 1 to Terminal Cluster 36 along path: [1, 1, 1, 2, 68, 3, 16, 25, 55, 32]
../_images/44c35f47690e4a1fe644eafb92675c2a24016a0e238418bf94bee552ca8e32ce.png ../_images/fb3be07ff064a720ab87af95b8677851228ad0e6f6a651765b540e51441babfb.png ../_images/d83ad55373def3d62bc0705efd78b7cfc8de250b53e5a1307cd23e10f536a363.png ../_images/7c001e35a2bdc3486b2fa49da6f1577bcde835b44d7ecc8e6372e50141544848.png