Slovenian Veterinary Research 2024  |  Vol 61 No 4  |  245
Comparative Analysis of Reference-Based Cell 
Type Mapping and Manual Annotation in Single 
Cell RNA Sequencing Analysis  
Key words
single-cell transcriptomics; 
peripheral blood mononuclear 
cells; 
reference mapping;
cell-type annotation; 
immune system
Larisa Goričan1,†, Boris Gole1,†, Gregor Jezernik1, Gloria Krajnc1,3, Uroš Potočnik1,2,3, 
Mario Gorenjak1* 
1Centre for Human Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, Taborska 
ulica 8, SI-2000 Maribor, 2Laboratory for Biochemistry, Molecular Biology and Genomics, Faculty of Chemistry 
and Chemical Engineering, University of Maribor, Smetanova ulica 17, SI-2000 Maribor, 3Department for 
Science and Research, University Medical Centre Maribor, Ljubljanska ulica 5, SI-2000 Maribor, Slovenia, 
†Equal contribution
*Corresponding author: mario.gorenjak@um.si
Abstract: Single-cell RNA sequencing (scRNA-seq) offers unprecedented insight into 
cellular diversity in complex tissues like peripheral blood mononuclear cells (PBMC). 
Furthermore, differential gene expression at a single-cell level can provide a basis for 
understanding the specialized roles of individual cells and cell types in biological pro-
cesses and disease mechanisms. Accurate annotation of cell types in scRNA-seq data-
sets is, however, challenging due to the high complexity of the data. Here, we compare 
two cell-type annotation strategies applied to PBMCs in scRNA-seq datasets: automat-
ed reference-based tool Azimuth and unsupervised Shared Nearest Neighbor (SNN) 
clustering, followed by manual annotation. Our results highlight the strengths and limita-
tions of the two approaches. Azimuth easily processed large-scale scRNA-seq datasets 
and reliably identified even relatively rare cell populations. It, however, struggled with cell 
types outside its reference range. In contrast, unsupervised SNN clustering clearly de-
lineated all the different cell populations in a sample. This makes it well suited for iden-
tifying rare or novel cell types, but the method requires time-consuming and bias-prone 
manual annotation. To minimize the bias, we used rigorous criteria and the collaborative 
expertise of multiple independent evaluators, which resulted in the manual annotation 
that was closely related to the automated one. Finally, pseudo-temporal analysis of the 
major cell types further confirmed the validity of the Azimuth and manual annotations. 
In conclusion, each annotation method has its merits and downsides. Our research thus 
highlights the need to combine different clustering and annotation approaches to man-
age the complexity of scRNA-seq and to improve the reliability and depth of scRNA-seq 
analyses.
Received: 29 December 2023
Accepted: 23 April 2024
Slov Vet Res
DOI 10.26873/SVR-1920-2024
UDC 601.4:577.21:577.27
Pages: 245–61
Original Research Article
Introduction 
Over the past decade, RNA sequencing (RNA-seq) has be-
come an indispensable tool in molecular biology, providing 
unprecedented insights into the transcriptomic landscape 
of cells. (1) By deciphering the complexity of human, ani-
mal, and plant transcriptomes, this technique has greatly 
enhanced our understanding of biological processes, 
disease mechanisms, and therapeutic interventions. (2) 
However, conventional RNA-seq, which analyses bulk tis-
sue samples, inherently averages the gene expression 
across many cells and cell types present in the sample, re-
sulting in a loss of resolution at the level of individual cells/
cell types. (3) This obscures the understanding of cellular 
heterogeneity and the roles of rare cell populations in tissue 
function and disease. (4)
246  |    Slovenian Veterinary Research 2024  |  Vol 61 No 4
The development of single-cell RNA sequencing (scRNA-
seq) has revolutionized the field by providing a lens for 
exploring the transcriptome at single-cell resolution. (5) 
The scRNA-seq provides a high-resolution view of tissue 
cellular diversity. It enables a more detailed understanding 
of complex biological processes and disease pathogen-
esis by revealing cell heterogeneity in a given population. 
Furthermore, scRNA-seq allows for the study of differential 
gene expression at a single-cell level, which can provide in-
sights into the unique functional roles of individual cells and 
contribute to a more nuanced understanding of biological 
processes and disease mechanisms. (6)
Despite its transformative potential, scRNA-seq also intro-
duces unique analytical challenges. Among these, anno-
tation of distinct cell populations in scRNA-seq datasets 
is a significant hurdle due to the high dimensionality and 
complexity of single-cell data. (7) To address this, various 
computational strategies have been developed. Azimuth, a 
publicly available automated cell-type annotation software 
(8), employs machine learning algorithms to predict human 
and murine cell identities based on scRNA-seq data. (9) In 
parallel, Seurat, a popular R package for scRNA-seq data 
analysis, offers clustering algorithms that partition single 
cells into distinct groups based on their transcriptomic 
profiles, providing an unbiased approach to cell population 
identification. (10) Manual annotation methods, on the other 
hand, employ in-depth biological knowledge to assign cell 
identities based on known marker genes and expression 
patterns. Such methods can leverage publicly available da-
tasets, such as those available at the Human Protein Atlas 
(HPA) (11) or the multi-species Single Cell Expression Atlas 
(12), providing a robust, albeit time-consuming, strategy.
In this study, our primary goal was to perform a compre-
hensive comparative analysis of different strategies for an-
notating peripheral blood mononuclear cell (PBMC) popula-
tions in single-cell RNA sequencing datasets: Azimuth, an 
automated reference-based cell type annotation approach; 
Shared nearest neighbor (SNN) reference annotation na-
ive approach, recommended by the authors of the Seurat 
single-cell analysis package for R as best practice (10); and 
manual annotation using two datasets publicly available at 
the HPA. We evaluated the performance of these methods 
in terms of accuracy, efficiency, and ability to handle the 
high dimensionality and complexity of scRNA-seq data. By 
exploring the strengths and limitations of each method, we 
aimed to provide critical insights that will help researchers 
choose the most effective strategy for annotating scRNA-
seq datasets.
Material and Methods 
A schematic representation of the steps involved in data 
acquisition and analysis is shown in Figure 1.
Datasets
Datasets- raw sequencing reads were obtained from the 
publicly available 10X Genomics database portal. (13) To 
validate PBMC populations, we used single-cell datasets 
obtained from healthy human donors, containing 10.000 
(pbmc10k) and 5.000 (pbmc5k) cells. The datasets used 
were 5k Peripheral Blood Mononuclear Cells (PBMCs) from 
a Healthy Donor (v3 chemistry) 
(https://www.10xgenomics.com/datasets/5-k-peripher-
al-blood-mononuclear-cells-pbm-cs-from-a-healthy-do-
nor-v-3-chemistry-3-1-standard-3-0-2) and 10k PBMCs 
from a Healthy Donor - Gene Expression with a Panel of 
TotalSeq™-B Antibodies (https://www.10xgenomics.com/
datasets/10-k-pbm-cs-from-a-healthy-donor-gene-expres-
sion-and-cell-surface-protein-3-standard-3-0-0). Both da-
tasets were downloaded on 10.05.2023.
scRNA-seq data analysis
Raw fastq files were first aligned to reference genome 
GRCh38 using CellRanger 7.1.0 software (10x Genomics). 
Generated matrices were further analyzed using Seurat 
package v5 (8) in R environment (14). Matrices were import-
ed using Seurat and converted to Seurat objects containing 
at least 200 features in 3 cells.
A comprehensive quality control was performed to remove 
objects indicating multiplets. For the pbmc10k sample, the 
multiplets rate was estimated at 7.8%, and for pbmc5k, at 
3.9%. These rates were also confirmed with DoubletFinder.
(15) Thus, for pbmc10k and pbmc5k, all objects with fea-
tures above 4000 and below 500 (empty droplets) or ob-
jects flagged as high-confidence doublets were discarded. 
Additionally, all objects expressing more than 10% of mito-
chondrial genes, which is a commonly chosen threshold. 
(16) Additionally, this threshold was selected based on num-
bers presented in 10x technical note CG000130. Objects 
with less than 5% of ribosomal genes were also filtered out 
to ensure healthy cells are retained as immune cells should 
have a high fraction of ribosomal proteins (Figure 2a). (16) 
Subsequently, X- and Y- chromosome genes were removed 
from the datasets to avoid sex-specific statistical bias due 
to the unknown genders of the samples. The genes with the 
highest expression were examined. MALAT1 (metastasis-
associated lung adenocarcinoma transcript 1) was identi-
fied as an extensive outlier, most probably representing a 
common technical issue, and was therefore also removed.
Next, both sample datasets were pooled, and cell cycle 
genes were flagged to calculate cell cycle scoring. The 
RNA assay data was first normalized using SCTransform.
(17) Then cell cycle scores were calculated on the new 
SCT assay and used to calculate the S cycle score minus 
G2M cycle score difference. SCTransform normalization 
was again performed using the RNA assay and regressed 
on the difference in cell-cycle scores and the percentage 
of mitochondrial genes. The new SCT assay was used for 
  Slovenian Veterinary Research 2024  |  Vol 61 No 4  |  247
downstream analysis and integration. We used at least 5000 
features for the final anchor selection out of merged 18913 
features across 10608 cells. Using integration, we identified 
the so-called anchors in the cross-dataset cell pairs that 
are in a matched biological state. These were used to cor-
rect for technical differences between datasets and align 
the cells between samples for comparative analyses. After 
integration, we performed principal component analysis 
for dimensionality reduction with 50 principal components 
(Figure 2b). Additionally, we performed uniform manifold 
approximation and projection (UMAP, Figure 2c) and t-dis-
tributed stochastic neighbor embedding (tSNE) analyses to 
visualize the high dimensional data obtained.
Automatic annotation, clustering, and conserved 
markers
First, automatic annotation was performed using Azimuth 
reference-based annotation of cells on three levels. (8) The 
human PBMC reference dataset was generated with 10x 
Genomics v3 as previously described. (8) Subsequently, the 
best cluster resolution was determined using the R pack-
age clustree. (18) Additionally, shared nearest-neighbor 
(SNN) modularity optimization clustering was deployed to 
cluster the cells (19).
The following resolutions were used for cluster granulation: 
0.2, 0.4, 0.6, 0.8, 1.0, 1.4, 1.6, 1.8 and 2.0. After identifying 
Figure 1: A schematic representation of the steps involved in data acquisition and analysis
248  |    Slovenian Veterinary Research 2024  |  Vol 61 No 4
Figure 2: Quality control and dataset integration. (a) Dispersion of cells in datasets after quality control- Nfeature_RNA: number of genes per cell; nCount_
RNA: number of transcripts per cell; percent.mt: percent of mitochondrial genes per cell; percent.ribo: percent of ribosomal genes per cell; percent.hb: 
percent of hemoglobin genes per cell; percent.plat: percent of platelet genes per cell. (b) PCA graph of the two datasets. (c) UMAP plot of aligned and 
integrated dataset cell landscape- pbmc10k dataset in background and pbmc5k dataset in foreground
  Slovenian Veterinary Research 2024  |  Vol 61 No 4  |  249
the best annotation and cluster resolution, conserved 
cell-type markers with the same perturbation direction in 
both datasets were identified by differential gene expres-
sion testing. The MetaDE R package embedded in Seurat's 
FindConservedMarkers function was used for this purpose. 
(20) Conserved markers (see Figure 3 for an example) were 
only identified in cell populations where at least three cells 
were present in an independent sample.
Cell type-specific marker selection and manual clus-
ter annotation
For manual annotation, we used two publicly available hu-
man datasets- the RNA HPA immune cell gene data (the 
HPA dataset) and the RNA Monaco immune cell gene data 
(the Monaco dataset), which we downloaded from the HPA 
website (https://www.proteinatlas.org/about/download) 
on 23.6.2023. The HPA dataset contains transcription data 
on 18 immune cell types from blood generated within the 
HPA project (21), while the Monaco dataset is based on the 
RNA-seq data generated on 29 FACS-sorted immune cell 
types from the PBMC of healthy donors. (22) The pipeline 
used to generate both datasets from the raw RNA-seq data, 
including quality control and normalization, is described on 
the HPA website. The downloaded datasets are based on 
The HPA version 23.0 and Ensembl version 109.
For both annotation datasets, we separately determined 
cell type-specific markers based on the normalized gene 
expression values, with a cutoff value of 4 as described 
on the HPA website (https://www.proteinatlas.org/about/
assays+annotation#hpa_rna). Genes whose normalized 
expression levels in a specific immune cell type were at 
least 4× higher than in any other immune cell type were 
considered cell type-specific markers for that specific im-
mune cell type. Similarly, genes whose normalized expres-
sion in a group of two or three immune cell types was at 
least 4× higher than in any other immune cell type were 
considered twin or group markers, respectively. In addi-
tion to the markers for the immune cell types defined in 
the two datasets, we also determined specific markers for 
several broader groups of immune cell types, for example, 
total CD8+ T-cells (comprised of Naïve CD8+ T-cell, Central 
memory CD8+ T-cells, Effector memory CD8+ T-cells and 
Terminal effector memory CD8+ T-cells in the Monaco data-
set). For these marker genes, it was defined that the lowest 
Figure 3: Visualization of selected conserved markers for classical (CD14+) monocytes. UMAP plots of CD14 (CD14 molecule), LYZ (lysozyme), S100A8 
(S100 calcium-binding protein A8) and S100A9 (S100 calcium-binding protein A9) expression, and of CD14/LYZ and S100A8/S100A9 co-expression 
across the PBMC populations
250  |    Slovenian Veterinary Research 2024  |  Vol 61 No 4
normalized expression level within the broader group of im-
mune cell types had to be at least 4× higher than in any 
other immune cell type not included in the specific group. 
Finally, scores were assigned to all the markers based on 
marker type. For the twin and group markers, relative differ-
ences in normalized gene expression within the pair/group 
were also considered (Table 1).
Next, the top 10 best-conserved markers for each cluster 
were determined. To this end, we first selected markers 
(genes) with a log2FC (fold change) >1.0 and an adjusted 
p-value <0.05 to ensure that only genes with both high and 
significant differences in expression levels between clus-
ters were considered. The markers meeting the criteria 
were then ranked based on the highest log2FC values.
Then, each cluster's top 10 conserved markers were cross-
referenced with the cell type-specific markers from each 
annotation dataset separately. In this way, clusters were as-
signed to possible cell types, and each possible cell type 
was assigned a score based on the scores of the markers 
identified by the cross-reference. For additional clarifica-
tion, those of the top 10 conserved markers not identified 
as cell type specific markers were also considered. If their 
expression in a particular immune cell type was at least 
4× or 2× higher than the average expression of all immune 
cell types in the annotation dataset, they were assigned 
a score of 2 or 1, respectively. These scores were added 
to the above scores of the possible cell types. The final 
scores obtained for each cluster from the two annotation 
datasets were then used by two independent evaluators to 
determine preliminary annotations for each cluster. Finally, 
the preliminary annotations were compared by the two 
evaluators and an additional referee to reach a consen-
sus annotation. In ambiguous cases, a broader annotation 
took precedence over a narrower one (I.e., B-cells vs naïve 
B-cells) unless multiple clusters shared the same annota-
tion: in such situations, we aimed for consensus with the 
narrower annotations.
Trajectory and pseudo-time analysis
The trajectory of the cell transitions and the pseudo-tem-
poral arrangement of cells during differentiation was ana-
lyzed using the R package monocle3 and the Python im-
plementation. (23–25) The previously constructed Seurat 
object was pre-processed and partitioned into the main 
cell types (monocytes, B-cells, T-cells). An explicit principal 
graph was learned using advanced machine learning called 
Reverse Graph Embedding to accurately resolve biological 
processes in individual cells’ Pseudo-time. This abstract 
measure of an individual cell's progress in cell differentia-
tion was calculated as the distance between a cell and the 
beginning of the trajectory measured along its shortest 
path. The total length of a trajectory was defined as the to-
tal amount of transcriptional changes a cell undergoes on 
its way from start to end state. The cells with the highest 
expression of the CD14 gene for monocytes and the cal-
culated start nodes for T-cells and B-cells were chosen as 
the roots or so-called beginnings of a biological process. 
To calculate the start node, the resident cells (double nega-
tive T-cells and intermediate B-cells) were first grouped 
Table 1: Immune cell type-specific markers
Marker Scoring Nr. of markers
Type Definition Cell type 1 Cell type 2 Cell type 3 HPA dataset Monaco dataset
Single nEL in CT1 > 4× nEL in any other CT 8 / / 1821 1581
Twin 2
nEL in CT1, CT2 > 4× nEL in any other CT
nEL in CT1 ≈ nEL in CT2
4 4 / 594 458
Twin 1+1
nEL in CT1, CT2 > 4× nEL in any other CT
nEL in CT1 > 4× nEL in CT2
8 4 / 224 149
Group 3
nEL in CT1, CT2, CT3 > 4× nEL in any other CT
nEL in CT1 ≈ nEL in CT2 ≈ nEL in CT3
2 2 2 436 273
Group 2+1
nEL in CT1, CT2, CT3 > 4× nEL in any other CT
nEL in CT1 ≈ nEL in CT2
nEL in CT1, CT2 >4× nEL in CT3
4 4 2 36 25
Group 1+2
nEL in CT1, CT2, CT3 > 4× nEL in any other CT
nEL in CT1 > 4× nEL in CT2, CT3
nEL in CT2 ≈ nEL in CT3
8 2 2 125 71
Group 1+1+1
nEL in CT1, CT2, CT3 > 4× nEL in any other CT
nEL in CT1 > 4× nEL in CT2, CT3
nEL in CT2 > 4× nEL in CT3
8 4 2 15 11
(nEL- normalized expression level, CT – cell type)
  Slovenian Veterinary Research 2024  |  Vol 61 No 4  |  251
according to the nearest node of the trajectory graph, and 
then the proportion of the cells at each node originating 
from the earliest time point was calculated. The node most 
heavily occupied by early cells was then selected as the 
root. Finally, the UMAP visualization was used to identify 
the pseudo-temporal cell state transition compared to the 
Azimuth annotation.
Results
Automatic annotation with Azimuth
After quality control and data integration, we used Azimuth's 
reference-based annotation of cells to automatically deter-
mine clusters and immune cell types. First, we evaluated 
three levels of cluster granulation to determine the best 
resolution of clusters using a clustering tree diagram. The 
first and second levels of annotation provide a clear sepa-
ration between all annotated clusters, while the third level 
exhibits some over-clustering (Figure 4a). Similarly, UMAP 
plots of the first two Azimuth annotated clustering levels 
show clear separations between clusters. At the same time, 
some over-clustering is evident in level three, for example, 
populations NK_2, NK_3, and NK_4 (Figure 4b). Overall, the 
resolution at level one provides information on eight, level 
two on 28 and level three on 51 distinct PBMC subpopula-
tions. Based on the cluster-tree analysis (I.e., presence of 
over-clustering and number of distinct subpopulations), 
level 2 was chosen as the best solution, providing sufficient 
resolution and the most information.
Unsupervised clustering according to SNN
Additionally, we performed SNN modularity optimization 
clustering. Again, the best resolution was chosen based 
on the clustering-tree diagram. Here, the best resolution of 
granulation was achieved at a resolution of 0.8, with higher 
and lower resolutions showing at least some over-cluster-
ing (Figure 5a). UMAP plots of the smallest (0.2), largest 
(2.0), and best (0.8) resolution of clustering were also in-
spected. Clustering at a resolution of 0.2 provided informa-
tion on 12 unannotated PBMC subpopulations, although 
more distinct clusters can be observed (for examples, see 
clusters 2, 3, and 4, Figure 5b). On the other hand, a reso-
lution of 2.0 resulted in 28 distinctive unannotated PBMC 
subpopulations, with clear signs of poor cluster separation 
in several instances (for examples, see clusters 4, 5, and 6 
or 2, 3, 7, and 19, Figure 5b). Only the best resolution (0.8) 
shows 18 well-separated PBMC subpopulations (Figure 
5b) and was thus chosen as the best resolution for further 
inspection.
Manual annotation of the Azimuth and SNN clusters
As described above, manual annotation was based on two 
publicly available datasets and two independent evaluators. 
Both evaluators cross-referenced the cell-type specific 
markers defined from the datasets with the top 10 con-
served markers from each cluster, thus creating four inde-
pendent preliminary annotations for all the clusters. The 
four preliminary annotations were then used to define each 
cluster's final, consensus annotation. Of note, we could not 
annotate all the clusters in this way- for 10 Azimuth anno-
tated clusters (for example, classical dendritic cells type 1, 
plasmablasts or hematopoietic stem/progenitor cells) and 
2 of the SNN clusters (clusters 16 and 17) no conserved 
markers could be defined (see Tables 2 and 3, respectively).
The consensus manual annotation was identical to the 
Azimuth annotation for 11/19 clusters for which conserved 
markers could be defined (Table 2). In 5 cases (myeloid 
dendritic cells instead of type 2 classical dendritic cells; 
Memory CD4+ T-cells vs Central memory CD4+ T-cells; 
T cells vs Effector memory and Cytotoxic CD4+ T-cells; 
natural killer cells vs CD56 bright natural killer cells), the 
manual annotation identified a super-set and in one case 
(Exhausted memory B-cells instead of Memory B-cells) 
a sub-set of the immune cell subtype identified by the 
Azimuth annotation. In the last cluster, manual annota-
tion identified a different sub-set (Non-switched memory 
B-cells) of the same super-set (B-cells) than the Azimuth 
annotation (Intermediate B-cells). Manual annotation of the 
16 SNN clusters, for which conserved markers could be de-
fined, identified 15 relatively specific immune cell subtypes, 
while in one cluster, only a very broad annotation (T-cells) 
could be determined (Table 3).
Comparison of the Azimuth and SNN clusters with 
manual annotation
Comparison of the 28 Azimuth annotated clusters, the 18 
unsupervised SNN clusters, and the manual annotation of 
the latter showed good matching for all the Monocytes, 
Dendritic cells, and B-cells populations/clusters as well 
as 2/3 natural killer cells populations (Figure 6a-b, Table 
4). Also matching are the Naïve CD4+ and CD8+ T-cells, 
Memory CD8+ T-cells, Mucosal-associated invariant 
T-cells, and γδ T-cells clusters. The rest of the T-cell popula-
tions do not match directly; however, in general, it is evident 
whether the clusters fall within CD4+ or CD8+ T-cell popu-
lations. The hematopoietic stem/progenitor cells, Innate 
lymphoid cells and platelet populations were not manually 
annotated due to the lack of appropriate conserved mark-
ers (Table 2). In the unsupervised SNN clustering, these 
populations do not represent separate clusters but are in-
stead distributed among (CD4+) T-cells associated clusters 
(Figure 6a-b, Table 4).
Pseudo-temporal trajectory analysis
Pseudo-temporal trajectory analysis was used as a final 
validation method. With this analysis we followed the cell 
state progress through the differentiation of three distinct 
Azimuth superclusters. In the partition of the Monocytes 
supercluster, it's visible that cells start to differentiate in 
252  |    Slovenian Veterinary Research 2024  |  Vol 61 No 4
Figure 4: Automated annotation with Azimuth. (a) Evaluation of Azimuth annotation levels using clustering tree. The best resolution is encircled in red. 
(b) UMAP plots of annotation using all three levels from the Azimuth database. Upper left: level one annotation clusters; Upper right: level two annotation 
clusters; Lower: level three annotation clusters
  Slovenian Veterinary Research 2024  |  Vol 61 No 4  |  253
Figure 5: Shared nearest neighbor clustering optimization. (a) Evaluation of clustering levels using clustering tree. The best resolution is encircled in red. 
(b) UMAP plots of the smallest (0.2; upper left), the largest (2.0; upper right), and the best (0.8; lower) resolution of clustering
254  |    Slovenian Veterinary Research 2024  |  Vol 61 No 4
Table 2: Comparison between the Azimuth and manual annotation of the best resolution clusters
Azimuth annotation
1st Evaluator’s provisional annotation 2nd Evaluator’s provisional annotation
Consensus annotation
HPA dataset Monaco dataset HPA dataset Monaco dataset
CD14+ Monocytes Classical Monocytes Classical Monocytes Classical Monocytes Classical Monocytes Classical Monocytes
CD16+ Monocytes Non-classical Monocytes Non-classical / Intermediate Monocytes Non-classical Monocytes
Non-classical / 
Intermediate Monocytes Non-classical Monocytes
cDC, type 1 / / / / /
cDC, type 2 mDC mDC mDC mDC mDC
pDC pDC pDC pDC pDC pDC
Naïve B-cells Naïve / Memory B-cells Naïve B-cells Naïve B-cells Naïve B-cells Naïve B-cells
Intermediate B-cells Memory / Naïve B-cells Non-switched memory B-cells Memory B-cells
Non-switched memory 
B-cells
Non-switched memory 
B-cells
Memory B-cells Memory / Naïve B-cells Exhausted / Switched memory B-cells Memory B-cells
Exhausted memory 
B-cells
Exhausted memory 
B-cells
Plasmablasts / / / / /
Double-negative T-cells / / / / /
Naïve CD4+ T-cells Naïve CD4+ T-cells Naïve CD4+ T-cells Naïve CD4+ T-cells Naïve CD4+ T-cells Naïve CD4+ T-cells
Proliferating CD4+ T-cells / / / / /
TCM CD4+ Naïve / Memory CD4+ T-cells
Naïve / TFH Memory 
CD4+ T-cells
Naïve / Memory CD4+ 
T-cells Naïve CD4+ T-cells Memory CD+ T-cells
TEM CD4+ MAIT / γδ T-cells MAIT / Vδ2+ γδ T-cells MAIT MAIT T-cells
CTL CD4+ / / / / /
Treg Treg Treg Treg Treg Treg
Naïve CD8+ T-cells Naïve CD8+ T-cells Naïve CD8+ T-cells Naïve CD8+ T-cells Naïve CD8+ T-cells Naïve CD8+ T-cells
Proliferating CD8+ T-cells / / / / /
TCM CD8+ Memory CD8+ T-cells TCM / TEM CD8+ Memory CD8+ T-cells TCM CD8+ TCM CD8+
TEM CD8+ Memory CD8+ T-cells TEM CD8+ Memory CD8+ T-cells TEM CD8+ TEM CD8+
MAIT MAIT MAIT MAIT MAIT MAIT
γδ T-cells γδ T-cells Vδ2+ γδ T-cells γδ T-cells Vδ2+ γδ T-cells γδ T-cells
NK NK / γδ T-cells NK NK / γδ T-cells NK NK
CD56 bright NK NK NK / Vδ2+ γδ T-cells NK NK NK
Proliferating NK / / / / /
HSPC / / / / /
ILC / / / / /
Platelets / / / / /
cDC (classical Dendritic Cells); CTL (Cytotoxic T-cells); HSPC (Hematopoietic stem/progenitor cells); ILC (Innate lymphoid cells); MAIT (Mucosal-associated 
invariant T-cells); mDC (myeloid Dendritic Cells); NK (Natural Killer Cells); pDC (plasmacytoid Dendritic Cells); TCM (Central Memory T-cells); TEM (Effector 
Memory T-cells); Treg (Regulatory T-cells)
  Slovenian Veterinary Research 2024  |  Vol 61 No 4  |  255
the middle of the CD14+ Monocytes cluster, progressing 
outwards (Figure 7a). The trajectory distinctively shows 
progression into Non-classical CD16+ monocytes, where-
as type 2 cDC cells are not connected with any trajectory. 
Within the B-cells supercluster, the starting node resides in 
the Intermediate B-cells with trajectory soon forking into 
two arms, both pointing towards Memory and Naïve B-cells 
(Figure 7b). The starting node within the T-cells superclu-
ster resides in the middle of the cluster (Figure 7c), a posi-
tion corresponding to the double negative T cells accord-
ing to the Azimuth annotation (Figure 4b). One trajectory 
clearly shows differentiation into CD4+ sub-populations 
and other-T cells, while the other branches early into CD8+ 
sub-populations and the natural killer cells.
Discussion
ScRNA-seq enables the simultaneous analysis of expres-
sion profiles and their interdependencies in multiple cell 
types present in a tissue of interest. This represents a 
qualitative leap forward in studying complex biological pro-
cesses and the role of individual cell subtypes in these pro-
cesses. Previously, several separate studies were required 
to achieve the same result. However, reliable and accurate 
identification of the cell subtypes present in a selected bio-
logical sample cannot be taken for granted. (26) In the work 
presented here, we compared the cell-type annotation tech-
niques/tools used in scRNA-seq to highlight their strengths 
and potential pitfalls. Specifically, we used an automated 
reference-based tool, Azimuth, and an SNN clustered ref-
erence-naïve approach followed by manual annotation to 
Table 3: Manual annotation of the best clusters according to SNN
SNN clusters
1st Evaluator’s provisional annotation 2nd Evaluator’s provisional annotation
Consensus annotation
HPA dataset Monaco dataset HPA dataset Monaco dataset
0 Naïve CD4+T-cells Naïve CD4+T-cells Naïve CD4+T-cells Naïve CD4+T-cells Naïve CD4+T-cells
1 Classical Monocytes / Neutrophils
Classical Monocytes / 
Neutrophils Classical Monocytes Classical Monocytes Classical Monocytes
2 Memory CD4+ T-cells / Treg
Th17 Memory CD4+ 
T-cells
Memory CD4+ T-cells 
/ Treg
Th17 Memory CD4+ 
T-cells Memory CD4+ T-cells
3 NK / γδ T-cells Non-Vδ2+ γδ T-cells / TEM CD8+ NK / γδ T-cells Non-Vδ2+ γδ T-cells T-cells
4 γδ T-cells / MAIT MAIT γδ T-cells MAIT MAIT
5 γδ T-cells / Treg Vδ2+ γδ T-cells / Non-Vδ2+ γδ T-cells / MAIT γδ T-cells / Treg Vδ2+ γδ T-cells γδ T-cells
6 Naïve CD8+ T-cells Naïve CD8+ T-cells Naïve CD8+ T-cells Naïve CD8+ T-cells Naïve CD8+ T-cells
7 Intermediate Monocytes Intermediate Monocytes Intermediate Monocytes Intermediate Monocytes Intermediate Monocytes
8 Memory CD8+ T-cells TEM CD8+ Memory CD8+ T-cells TEM CD8+ Memory CD8+ T-cells
9 Memory B-cells Exhausted memory B-cells Memory B-cells
Exhausted memory 
B-cells
Exhausted memory 
B-cells
10 Naïve B-cells Naïve B-cells Naïve B-cells Naïve B-cells Naïve B-cells
11 NK NK NK NK NK
12 Naïve / Memory B-cells Naïve / Non-switched memory B-cells Naïve B-cells Naïve B-cells
Non-switched memory 
B-cells
13 Non-classical Monocytes Non-classical / Intermediate Monocytes Non-classical Monocytes
Non-classical / 
Intermediate Monocytes Non-classical Monocytes
14 mDC mDC mDC mDC mDC
15 pDC pDC pDC pDC pDC
16 / / / / /
17 / / / / /
MAIT (Mucosal-associated invariant T-cells); mDC (myeloid Dendritic Cells); NK (Natural Killer Cells); pDC (plasmacytoid Dendritic Cells); TEM (Effector 
Memory T-cells); Treg (Regulatory T-cells)
256  |    Slovenian Veterinary Research 2024  |  Vol 61 No 4
Table 4: Comparison of the best Azimuth annotation, the best SNN clustering resolution, and manual annotation.
Azimuth annotation SNN cluster with cells present in the Azimuth annotation
Manual annotation consensus
The best-resolution Azimuth 
clusters
The best clusters, according to 
SNN
CD14+ Monocytes 0, 1* , 2, 3, 7*, 9, 14, 16 Classical Monocytes Classical Monocytes; Intermediate Monocytes
CD16+ Monocytes 1, 2, 7, 13* Non-classical Monocytes Non-classical Monocytes
cDC, type 1 14 / mDC
cDC, type 2 7, 9, 14* mDC mDC
pDC 15 pDC pDC
Naïve B-cells 1, 10*, 12*, 17 Naïve B-cells Naïve B-cells; Non-switched memory B-cells
Intermediate B-cells 9*, 10, 12, 17 Non-switched memory B-cells Exhausted memory B-cells
Memory B-cells 9*, 17 Exhausted memory B-cells Exhausted memory B-cells
Plasmablasts 16 / /
Double-negative T-cells 0, 5 / Naïve CD4+T-cells; γδ T-cells
Naïve CD4+ T-cells 0*, 1, 6 Naïve CD4+ T-cells Naïve CD4+T-cells
Proliferating CD4+ T-cells 1, 2, 4 / Classical Monocytes; Memory CD4+ T-cells; MAIT
TCM CD4+ 0*, 1, 2*, 3, 4, 5, 6, 8, 10 Memory CD+ T-cells Naïve CD4+T-cells; Memory CD4+ T-cells
TEM CD4+ 0, 2, 4, 5*, 8 T-cells γδ T-cells
CTL CD4+ 3, 8 / T-cells; Memory CD8+ T-cells
Treg 0, 2*, 6 Treg Memory CD4+ T-cells
Naïve CD8+ T-cells 0, 2, 4, 6*, 8 Naïve CD8+ T-cells Naïve CD8+ T-cells
Proliferating CD8+ T-cells 1, 4, 8 / Classical Monocytes; MAIT; Memory CD8+ T-cells
TCM CD8+ 0, 2, 5*, 6*, 8 TCM CD8+ γδ T-cells; Naïve CD8+ T-cells
TEM CD8+ 0, 1, 2, 3, 4, 5, 6, 8* TEM CD8+ Memory CD8+ T-cells
MAIT 2, 3, 4*, 5 MAIT MAIT
γδ T-cells 0, 2, 3, 4*, 5*, 6, 8, 11 γδ T-cells MAIT; γδ T-cells
NK 3*, 4, 8, 11*, 17 NK T-cells; NK
NK CD56 bright 11 NK NK
NK proliferating 1, 3 / Classical Monocytes; T-cells
HSPC 0, 1, 2 / Naïve CD4+T-cells; Classical Monocytes; Memory CD4+ T-cells
ILC 2*, 5 / Memory CD4+ T-cells
Platelets 3 / T-cells
* The most abundant SNN cluster; cDC (classical Dendritic Cells); CTL (Cytotoxic T-cells); HSPC (Hematopoietic stem/progenitor cells); ILC (Innate lymphoid 
cells); MAIT (Mucosal-associated invariant T-cells); mDC (myeloid Dendritic Cells); NK (Natural Killer Cells); pDC (plasmacytoid Dendritic Cells); TCM (Central 
Memory T-cells); TEM (Effector Memory T-cells); Treg (Regulatory T-cells)
  Slovenian Veterinary Research 2024  |  Vol 61 No 4  |  257
gain insights into their utility and effectiveness in decipher-
ing complex cellular compositions in scRNA-seq datasets.
As a starting point for the analyses, we chose PBMC, a 
relatively complex but easily accessible biological sample 
commonly used in medical and veterinary research. We 
first used the automatic annotation tool Azimuth. Its an-
notations are based on a reference PBMC dataset gener-
ated from 24 samples processed with a CITE-seq (Cellular 
Indexing of Transcriptomes and Epitopes by Sequencing) 
panel, which performs RNA sequencing along with obtain-
ing quantitative and qualitative information about proteins 
(i.e., cell type-specific antigens) on the cell surface. (8) 
Azimuth automatic annotation has demonstrated the abil-
ity to process large scRNA-seq datasets quickly and ac-
curately. The performance of this machine learning-based 
tool reflects ongoing advances in computational biology, 
particularly in the automated processing of biological data. 
(27–29) Performance of the automated annotation tools 
may decline when confronted with rare cell types, as the 
classifier may be unable to learn their information during 
the training phase (30). 
In that regard, Azimuth also proved relatively well, as it de-
fined several PBMC populations with low abundance (i.e., 
classical dendritic cells, plasmacytoid dendritic cells, hema-
topoietic stem/progenitor cells, Innate lymphoid cells) (31–
33) of which the first two we could independently confirm 
with the manual annotation. Like any other reference-based 
tool, however, it cannot recognize / annotate populations 
that lie outside its frame of reference. (34) For example, 
CD14+ and CD16+ monocyte populations were annotated 
that roughly correspond to the classical (CD14+CD16neg) 
and non-classical (CD14dimCD16+) monocytes in the HPA 
and Monaco datasets, respectively, but the intermediate 
(CD14+CD16+) monocytes could not be distinguished.
Figure 6: Comparison of Azimuth and SNN clustered cell landscapes (a) UMAP plots of the best resolution Azimuth clusters (left) and the best resolution 
SNN clusters (right). (b) tSNE plots of the best-resolution Azimuth clusters (left) and the best-resolution SNN clusters (right)
258  |    Slovenian Veterinary Research 2024  |  Vol 61 No 4
Unsupervised SNN clustering, on the other hand, easily de-
fined three distinct populations at the position correspond-
ing to the monocytes in the Azimuth analysis. Subsequent 
manual annotation identified them as the three monocyte 
types mentioned above. The ability of this method to ef-
fectively delineate cell populations is well documented (27), 
and our results confirm its robustness in unsupervised 
clustering. The main advantage of unsupervised cluster-
ing over the reference-based one is that it does not attempt 
to fit cells into the pre-existing reference frame. Instead, 
the cells are clustered merely according to their similarity. 
Unsupervised clustering thus provides more opportunities 
to recognize rare or even new populations. (34) Conversely, 
the same fact is also a major disadvantage of the unsu-
pervised method, as one cannot avoid the time-consuming 
manual annotation of the individual clusters. (34)
In our manual annotation we prioritized biological relevance 
and statistical rigor. We followed strict criteria, similar to 
the HPA protocols, for cell type-specific markers selection. 
Figure 7: Pseudo-temporal analysis of selected immune cell types. (a) UMAP visualization of pseudo-temporal trajectories (left) and Azimuth annotations 
(right) of the monocytes partition on levels one and two. (b) UMAP visualization of pseudo-temporal trajectories (left) and Azimuth annotations (right) 
of the B-cells partition on levels one and two. (c) UMAP visualization of pseudo-temporal trajectories (left) and Azimuth annotations (right) of the T-cells 
partition on levels one and two
  Slovenian Veterinary Research 2024  |  Vol 61 No 4  |  259
Similarly, we used rigorous criteria to define the top 10 best-
conserved markers per cluster, which were then used for 
comparison with the cell type-specific markers and, thus, 
cell type annotation of the clusters. We also tried annota-
tion with all the conserved markers (15 – 855 conserved 
markers per cluster, not shown). A similar approach was 
used for the scSorter tool, where they combined the ex-
pression of marker and non-marker genes for clustering. 
(35) We found no significant differences in annotation with 
all versus only the top 10 markers, so we chose the latter, 
a somewhat less time-consuming approach, for further 
analysis. Of note is that the stringency of the above criteria 
for conserved markers resulted in no conserved markers 
being defined for some clusters. This further meant that 
these clusters could not be manually annotated; we howev-
er decided against loosening the criteria. Besides objective 
data, manual interpretation also benefits from the evalua-
tor’s understanding of the biological processes, but at the 
same time, it inherently creates bias. (34) To minimize bias, 
we used two independent annotation datasets, both based 
on the FACS sorted cell populations (21, 22), and employed 
two independent evaluators, plus an additional arbiter, to 
reach consensus annotation for each cluster. This careful 
approach ensured high accuracy in identifying different cell 
types, as asserted by the high similarity of the manual and 
the automated (Azimuth) annotations.
Many discrepancies between the two annotations can be 
explained by the differences in how specific cell subtypes 
are defined in the respective reference datasets. Particularly 
challenging are the phenotypically and functionally highly 
heterogeneous subsets of the T-cells (36), where the HPA 
dataset recognizes 7 and the Monaco dataset 15 separate 
entities (21, 22), which were in turn used to validate the 13 
T-cell clusters identified by the Azimuth manually. At the 
exact coordinates, the unsupervised SNN clustering identi-
fied only 6 distinct populations, all manually annotated as 
various T-cell subsets. Directly comparing the SNN clus-
ters with the Azimuth annotations further emphasizes the 
invaluability of using multiple approaches when tackling 
complex populations/clusters. Namely, it clearly shows 
that a population, coherent at a given level, may consist of 
several distinct subpopulations. These are not necessarily 
closely related, and vice-versa, the well-defined cell types 
may be dispersed over several distinct clusters. In clinical 
samples related to a specific pathology, such instances can 
provide opportunities for the identification of important rare 
and potentially even novel subpopulations. They should 
thus be more thoroughly investigated at a higher resolution.
The selected resolution of the clustering directly influ-
ences the granularity of the identified cell types and, thus, 
the depth of the biological insights that can be gained. (18) 
A high resolution can reveal subtle differences between 
cell populations and possibly visualize rare or transitional 
states of cells, but at a risk of decreased reliability of clus-
tering- for example, see CD4+ T-cells sub-clusters at reso-
lution level 3 (Figure 4a). Conversely, lower resolution may 
be highly reliable, but risks conflating cell types with dif-
ferent functions (for example looking at combined CD4+ 
T-cells instead of the subsets with very distinct roles- regu-
latory, helper, effector, etc.) and can thus miss important 
biological differences. (36) Hence, the optimal resolutions 
we chose for both reference-based and unsupervised clus-
tering are compromises, balancing between distinguishing 
meaningful cellular subtypes and avoiding fragmentation 
of homogeneous populations into overly granular clusters. 
Alas, as with any compromise, the optimal resolution does 
not satisfy completely, which is most evident when cluster-
ing the B-cells. Here, at optimal resolutions, Azimuth and 
SNN clustering identify 3 and 4 distinct subpopulations, 
respectively. Visually, though, one can easily distinguish 
5-6 entities, suggesting that a higher resolution would be 
needed here.
The meaningfulness of a granularity higher than the one 
defined by the optimal resolution for the B-cell subsets 
was also confirmed by the pseudo-temporal analysis. 
The pseudo-temporal dimension introduces a framework 
for mapping progression states and inferring transitional 
states and lineage relationships. It highlights not only the 
end states cells reach but also the paths they take to get 
there. (24) Using this method, we further validated the T-cell 
and monocyte subsets annotations. The pseudo-temporal 
trajectories also clearly show that the automatic Azimuth 
annotations cohere with known cell differentiation stages. 
The method has previously been instrumental in charting 
developmental trajectories, and our application further un-
derscores its value in modeling cellular dynamics, as has 
been explored in other studies focusing on differentiation 
and immune cells. (37, 38)
Regardless of the sample type, its origin, underlying pathol-
ogy, and the scientific question, single-cell RNA sequenc-
ing has little value if one cannot properly identify the single 
cells. Novel and ever more powerful tools for accurate and 
reliable annotation of the cells/clusters are therefore being 
developed. (39–41) However, as demonstrated here, each 
method has its merits and downsides. The methods and 
results of our study have significant implications for the 
further development of scRNA-seq applications, not only in 
the field of human medicine but even more in the field of 
veterinary medicine. Unlike in human medicine, namely, in 
veterinary medicine, there is often a lack of comprehensive 
databases for reference-based annotation. (12, 42) This 
makes using automated annotation tools such as Azimuth 
difficult and emphasizes the importance of integrating dif-
ferent annotation approaches. In the future, integrated tools 
may be developed that will combine the efficiency of the au-
tomated annotation and expert insight of the manual one, 
the accuracy of the reference-based annotation with the 
flexibility of the unsupervised clustering. Until then, a skill-
ful combination of automated and manual annotation tech-
niques is needed to manage the complexity of scRNA-seq 
data when reference databases are limited or non-existent. 
This approach is particularly crucial in veterinary science, 
260  |    Slovenian Veterinary Research 2024  |  Vol 61 No 4
where the study of different species requires a custom-
ized approach to cell type annotation, given the variability 
in genetic and cellular profiles of different species. With it, 
scRNA-seq research can open new avenues for discovery 
in cell biology and its applications in health and disease.
Acknowledgments
This work was not supported by any specific funding.
References 
1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for tran-
scriptomics. Nat Rev Genet 2009; 10(1): 57–63. doi: 10.1038/nrg2484
2. Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. 
Nat Rev Genet 2019; 20(11): 631–56. doi: 10.1038/s41576-019-0150-2
3. Haque A, Engel J, Teichmann SA, Lönnberg T. A practical guide to 
single-cell RNA-sequencing for biomedical research and clinical appli-
cations. Genome Med 2017; 9(1): 75. doi: 10.1186/s13073-017-0467-4
4. Wagner A, Regev A, Yosef NC. Revealing the vectors of cellular identity 
with single-cell genomics. Nat Biotechnol 2016; 34(11): 1145–60. doi: 
10.1038/nbt.3711
5. Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, et al. mRNA-
Seq whole-transcriptome analysis of a single cell. Nat Methods 2009; 
6(5):3 77–82. doi: 10.1038/nmeth.1315
6. Wang T, Li B, Nelson CE, Nabavi S. Comparative analysis of differential 
gene expression analysis tools for single-cell RNA sequencing data. 
BMC Bioinformatics 2019; 20(1): 40. doi: 10.1186/s12859-019-2599-6
7. Poirion OB, Zhu X, Ching T, Garmire L. Single-cell transcriptomics bioin-
formatics and computational challenges. Front Genet 2016; 7: 163. doi: 
10.3389/fgene.2016.00163
8. Hao Y, Hao S, Andersen-Nissen E, et al. Integrated analysis of multi-
modal single-cell data. Cell 2021; 184(13): 3573–7, e29. doi: 10.1016/j.
cell.2021.04.048
9. Stuart T, Butler A, Hoffman P, et al. Comprehensive integration of 
single-cell data. Cell 2019; 177(7): 1888–902, e21. doi: 10.1016/j.
cell.2019.05.031
10. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-
cell transcriptomic data across different conditions, technologies, and 
species. Nat Biotechnol 2018; 36(5): 411–20. doi: 10.1038/nbt.4096
11. The Human Protein Atlas. Stockholm: Affinity proteomics, 2023. 
https://www.proteinatlas.org/ (18. 11. 2023)
12. EMBL-EBI. Single cell expression atlas. Hinxton: European Molecular 
Biology Laboratory, 2023. https://www.ebi.ac.uk/gxa/sc/home (5. 12. 
2023) 
13. 10x Genomics Datasets. Pleasanton: 10x Genomics, 2023. https://
www.10xgenomics.com/datasets?query=&page=1&configure%5Bhits
PerPage%5D=50&configure%5BmaxValuesPerFacet%5D=1000 (3. 7. 
2023)
14. R Foundation. The R project for statistical computing. Wien: The R 
Foundation, 2023. https://www.r-project.org/ (3. 7. 2023)
15. McGinnis CS, Murrow LM, Gartner ZJ. Doubletfinder: doublet detec-
tion in single-cell RNA sequencing data using artificial nearest neigh-
bors. Cell Syst 2019; 8(4): 329–37, e4. doi: 10.1016/j.cels.2019.03.003
16. Subramanian A, Alperovich M, Yang Y, Li B. Biology-inspired data-driv-
en quality control for scientific discovery in single-cell transcriptomics. 
Genome Biol 2022; 23(1): 267. doi: 10.1186/s13059-022-02820-w
17. Hafemeister C, Satija R. Normalization and variance stabilization of 
single-cell RNA-seq data using regularized negative binomial regres-
sion. Genome Biol 2019; 20(1): 296. doi: 10.1186/s13059-019-1874-1
18. Zappia L, Oshlack A. Clustering trees: a visualization for evaluating 
clusterings at multiple resolutions. Gigascience 2018; 7(7): giy083. doi: 
10.1093/gigascience/giy083
19. Waltman L, van Eck NJ. A smart local moving algorithm for large-scale 
modularity-based community detection. Eur Phys J B. 2013; 86(11): 
471. doi: 10.1140/epjb/e2013-40829-0
20. Lu S, Li J, Song C, Shen K, Tseng GC. Biomarker detection in the inte-
gration of multiple multi-class genomic studies. Bioinformatics 2010; 
26(3): 333–40. doi: 10.1093/bioinformatics/btp669
21. Uhlen M, Karlsson MJ, Zhong W, et al. A genome-wide transcriptomic 
analysis of protein-coding genes in human blood cells. Science 2019 ; 
20; 366(6472): eaax9198. doi: 10.1126/science.aax9198
22. Monaco G, Lee B, Xu W, et al. RNA-seq signatures normalized by mrna 
abundance allow absolute deconvolution of human immune cell types. 
Cell Rep 2019; 26(6): 1627–40, e7. doi: 10.1016/j.celrep.2019.01.041
23. Qiu X, Mao Q, Tang Y, et al. Reversed graph embedding resolves com-
plex single-cell trajectories. Nat Methods 2017; 14(10): 979–82. doi: 
10.1038/nmeth.4402
24. Trapnell C, Cacchiarelli D, Grimsby J, et al. The dynamics and regula-
tors of cell fate decisions are revealed by pseudotemporal ordering of 
single cells. Nat Biotechnol 2014; 32(4): 381–6. doi: 10.1038/nbt.2859
25. Cao J, Spielmann M, Qiu X, et al. The single-cell transcriptional land-
scape of mammalian organogenesis. Nature 2019; 566(7745): 496–
502. doi: 10.1038/s41586-019-0969-x
26. Li X, Wang CY. From bulk, single-cell to spatial RNA sequencing. Int J 
Oral Sci 2021; 13(1): 36. doi: 10.1038/s41368-021-00146-0
27. Lähnemann D, Köster J, Szczurek E, et al. Eleven grand challenges in 
single-cell data science. Genome Biol 2020; 21(1): 31. doi: 10.1186/
s13059-020-1926-6
28. Pasquini G, Rojo Arias JE, Schäfer P, Busskamp V. Automated methods 
for cell type annotation on scRNA-seq data. Comput Struct Biotechnol 
J 2021; 19: 961–9. doi: 10.1016/j.csbj.2021.01.015
29. Abdelaal T, Michielsen L, Cats D, et al. A comparison of automatic cell 
identification methods for single-cell RNA sequencing data. Genome 
Biol 2019; 20: 194. doi: 10.1186/s13059-019-1795-z
30. Cheng Y, Fan X, Zhang J, Li Y. A scalable sparse neural network frame-
work for rare cell type annotation of single-cell transcriptome data. 
Commun Biol 2023; 6: 545. doi: 10.1038/s42003-023-04928-6
31. Flórez-Grau G, Escalona JC, Lacasta-Mambo H, et al. Human den-
dritic cell subset isolation by magnetic bead sorting: a protocol to ef-
ficiently obtain pure populations. Bio Protoc 2023; 13(20): e4851. doi: 
10.21769/BioProtoc.4851
32. Nishide M, Nishimura K, Matsushita H, et al. Single-cell multi-omics 
analysis identifies two distinct phenotypes of newly-onset micro-
scopic polyangiitis. Nat Commun 2023; 14(1): 5789. doi: 10.1038/
s41467-023-41328-0
33. Bonne-Année S, Bush MC, Nutman TB. Differential Modulation of 
Human Innate Lymphoid Cell (ILC) Subsets by IL-10 and TGF-β. Sci 
Rep. 20191004th ed. 2019 Oct;9(1):14305. 
  Slovenian Veterinary Research 2024  |  Vol 61 No 4  |  261
Primerjalna analiza referenčno osnovanega mapiranja celičnih tipov in 
ročne anotacije pri analizi sekvenciranja RNA posamezne celice
L. Goričan, B. Gole, G. Jezernik, G. Krajnc, U. Potočnik, M. Gorenjak
Izvleček: Sekvenciranje RNA v posamezni celici (scRNA-seq) omogoča edinstven vpogled v celično raznolikost komple-
ksnih tkiv, kot so mononuklearne celice periferne krvi (PBMC). Dodatno je diferencialno izražanje genov na ravni posa-
meznih celic lahko osnova za razumevanje specializiranih vlog posameznih celic in celičnih tipov v bioloških procesih in 
bolezenskih mehanizmih. Zaradi velike kompleksnosti pa je točna določitev celičnih tipov v zbirkah podatkov scRNA-seq 
zahtevna. V članku primerjamo dve strategiji določanja celičnih tipov, ki se uporabljata za PBMC v zbirkah podatkov 
scRNA-seq: avtomatizirano, na referenčnih bazah podatkov temelječe orodje »Azimuth« in nenadzorovano razvrščanje v 
grozde »Shared Nearest Neighbour« (SNN), ki mu sledi ročno določanje celičnih tipov. Naši rezultati poudarijo prednosti 
in omejitve obeh pristopov. »Azimuth« je zlahka obdelal obsežne podatkovne nize scRNAseq in zanesljivo prepoznal tudi 
razmeroma redke populacije celic. Imel pa je težave s celičnimi tipi izven svojega referenčnega območja. Nasprotno 
je nenadzorovano razvrščanje SNN jasno razmejilo vse različne celične populacije v vzorcu. Metoda SNN je zato zelo 
primerna za prepoznavanje redkih ali novih tipov celic, vendar zahteva dolgotrajno ročno določanje celičnih tipov, ki je 
nagnjeno k pristranskosti. S strogimi merili in skupnim strokovnim znanjem več neodvisnih ocenjevalcev smo to pris-
transkost minimalizirali. Naše ročno določanje celičnih tipov je tako le malo odstopalo od avtomatiziranega. Nazadnje 
je veljavnost določitve celičnih tipov z orodjem »Azimuth« in ročno metodo potrdila še psevdočasovna analiza glavnih 
celičnih tipov. Naša raziskava tako poudarja nujo po kombiniranju različnih pristopov razvrščanja in določanja celičnih 
populacij za izboljšanje zanesljivosti in globine analiz scRNA-seq.
Ključne besede: transkriptomika posamezne celice; mononuklearne celice periferne krvi; referenčno mapiranje; an-
otacija celičnih tipov; imunski sistem
34. Bej S, Galow AM, David R, Wolfien M, Wolkenhauer O. Automated anno-
tation of rare-cell types from single-cell RNA-sequencing data through 
synthetic oversampling. BMC Bioinformatics 2021; 22(1): 557. doi: 
10.1186/s12859-021-04469-x
35. Guo H, Li J. scSorter: assigning cells to known cell types accord-
ing to marker genes. Genome Biol 2021; 22(1): 69. doi: 10.1186/
s13059-021-02281-7
36. Andreatta M, Corria-Osorio J, Müller S, Cubas R, Coukos G, Carmona 
SJ. Interpretation of T cell states from single-cell transcriptomics data 
using reference atlases. Nat Commun 2021; 12(1): 2965. doi: 10.1038/
s41467-021-23324-4
37. Bendall SC, Davis KL, Amir el-AD, et al. Single-cell trajectory detection 
uncovers progression and regulatory coordination in human B cell de-
velopment. Cell. 2014; 157(3): 714–25. doi: 10.1016/j.cell.2014.04.005
38. Yao C, Sun HW, Lacey NE, et al. Single-cell RNA-seq reveals TOX as 
a key regulator of CD8+ T cell persistence in chronic infection. Nat
Immunol 2019; 20(7): 890–901. doi: 10.1038/s41590-019-0403-4
39. Wan H, Chen L, Deng M. scEMAIL: universal and source-free an-
notation method for scRNA-seq data with novel cell-type percep-
tion. Genomics Proteomics Bioinformatics 2022; 20(5): 939–58. doi: 
10.1016/j.gpb.2022.12.008
40. Ji X, Tsao D, Bai K, Tsao M, Xing L, Zhang X. scAnnotate: an auto-
mated cell-type annotation tool for single-cell RNA-sequencing data.
Bioinform Adv 2023; 3(1): vbad030. doi: 10.1093/bioadv/vbad030
41. Nguyen V, Griss J. scAnnotatR: framework to accurately classify cell 
types in single-cell RNA-sequencing data. BMC Bioinformatics 2022; 
23(1): 44. doi: 10.1186/s12859-022-04574-5
42. Yao Z, Liu H, Xie F, et al. A transcriptomic and epigenomic cell atlas of 
the mouse primary motor cortex. Nature 2021; 598(7879): 103–10. doi:
10.1038/s41586-021-03500-8