The `iSEEfier` User's Guide
Najla Abassi
Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainznajla.abassi@uni-mainz.de
Federico Marini
Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), MainzResearch Center for Immunotherapy (FZI), Mainzmarinif@uni-mainz.de
4 October 2024
Source:vignettes/iSEEfier_userguide.Rmd
iSEEfier_userguide.Rmd
Introduction
This vignette describes how to use the iSEEfier
package to configure various initial states of iSEE instances, in order
to simplify the task of visualizing single-cell RNA-seq, bulk RNA-seq
data, or even your proteomics data in iSEE. In the
remainder of this vignette, we will illustrate the main features of
r BiocStyle::Biocpkg("iSEEfier")
on a publicly available
dataset from Baron et al. “A Single-Cell Transcriptomic Map of the Human
and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure”,
published in Cell Systems in 2016. doi:10.1016/j.cels.2016.08.011.
The data is made available via the scRNAseq
Bioconductor package. We’ll simply use the mouse dataset, consisting of
islets isolated from five C57BL/6 and ICR mice. # Getting started
{#gettingstarted} To install iSEEfier
package, we start R and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("iSEEfier")
Once installed, the package can be loaded and attached to the current workspace as follows:
Create an initial state for gene expression visualization using
iSEEinit()
When we have all input elements ready, we can create an
iSEE
initial state by running:
iSEEinit(sce = sce_obj,
features = feature_list,
reddim.type = reduced_dim,
clusters = cluster,
groups = group,
add_markdown_panel = FALSE)
To configure the initial state of our iSEE
instance
using iSEEinit()
, we need five parameters:
-
sce
: ASingleCellExperiment
object. This object stores information of different quantifications (counts, log-expression…), dimensionality reduction coordinates (t-SNE, UMAP…), as well as some metadata related to the samples and features. We’ll start by loading thesce
object:
library("scRNAseq")
sce <- BaronPancreasData('mouse')
sce
#> class: SingleCellExperiment
#> dim: 14878 1886
#> metadata(0):
#> assays(1): counts
#> rownames(14878): X0610007P14Rik X0610009B22Rik ... Zzz3 l7Rn6
#> rowData names(0):
#> colnames(1886): mouse1_lib1.final_cell_0001 mouse1_lib1.final_cell_0002
#> ... mouse2_lib3.final_cell_0394 mouse2_lib3.final_cell_0395
#> colData names(2): strain label
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
Let’s add the normalized counts
library("scuttle")
sce <- logNormCounts(sce)
Now we can add different dimensionality reduction coordinates
Now our sce
is ready, we can move on to the next
argument.
-
features
: which is a vector or a dataframe containing the genes/features of interest. Let’s say we would like to visualize the expression of some genes that were identified as marker genes for different cell population.
gene_list <- c("Gcg", # alpha
"Ins1") # beta
-
reddim_type
: In this example we decided to plot our data as a t-SNE plot.
reddim_type <- "TSNE"
-
clusters
: Now we specify what clusters/cell-types/states/samples we would like to color/split our data with
# cell populations
cluster <- "label" #the name should match what's in the colData names
-
groups
: Here we can add the groups/conditions/cell-types
# ICR vs C57BL/6
group <- "strain" #the name should match what's in the colData names
We can choose to include in this initial step a
MarkdownBoard
by setting the arguments
add_markdown_panel
to TRUE
. At this point, all
the elements are ready to be transferred into
iSEEinit()
initial1 <- iSEEinit(sce = sce,
features = gene_list,
clusters = cluster,
groups = group,
add_markdown_panel = TRUE)
In case our features
parameter was a data.frame, we
could assign the name of the column containing the features to the
gene_id
parameter.
Now we are one step away from visualizing our list of genes of
interest. All that’s left to do is to run iSEE
with the
initial state created with iSEEinit()
This instance, generated with iSEEinit()
, returns a
combination of panels, linked to each other, with the goal of
visualizing the expression of certain marker genes in each cell
population/group:
- A
ReducedDimensionPlot
,FeatureAssayPlot
andRowDataTable
for each single gene infeatures
. - A
ComplexHeatmapPlot
with all genes infeatures
- A
ColumnDataPlot
panel - A
MarkdownBoard
panel
Create an initial state for feature sets exploration using
iSEEnrich()
Sometimes it is interesting to look at some specific feature sets and
the associated genes. That’s when the utility of iSEEnrich
becomes apparent. We will need 4 elements to explore feature sets of
interest:
-
sce
: A SingleCellExperiment object -
collection
: A character vector specifying the gene set collections of interest (it is possible to use GO or KEGG terms) -
gene_identifier
: A character string specifying the identifier to use to extract gene IDs for the organism package. This can be “ENS” for ENSEMBL ids, “SYMBOL” for gene names… -
organism
: A character string of theorg.*.eg.db
package to use to extract mappings of gene sets to gene IDs. -
reddim_type
: A string vector containing the dimensionality reduction type -
clusters
: A character string containing the name of the clusters/cell-type/state…(as listed in the colData of the sce) -
groups
: A character string of the groups/conditions…(as it appears in the colData of the sce)
GO_collection <- "GO"
Mm_organism <- "org.Mm.eg.db"
gene_id <- "SYMBOL"
cluster <- "label"
group <- "strain"
reddim_type <- "PCA"
Now let’s create this initial setup for iSEE
using
iSEEnrich()
results <- iSEEnrich(
sce = sce,
collection = GO_collection,
gene_identifier = gene_id,
organism = Mm_organism,
clusters = cluster,
reddim_type = reddim_type,
groups = group
)
iSEEnrich
will specifically return a list with the
updated sce
object and its associated initial
configuration. To start the iSEE
instance we run:
iSEE(results$sce, initial = results$initial)
Create an initial state for marker gene exploration using
iSEEmarker()
In many cases, we are interested in determining the identity of our
clusters, or further subset our cells types. That’s where
iSEEmarker()
comes in handy. Similar to
iSEEinit()
, we need the following parameters:
-
sce
: aSingleCellExperiment
object -
clusters
: the name of the clusters/cell-type/state -
groups
: the groups/conditions -
selection_plot_format
: the class of the panel that we will be using to select the clusters of interest.
initial3 <- iSEEmarker(
sce = sce,
clusters = cluster,
groups = group,
selection_plot_format = "ColumnDataPlot")
This function returns a list of panels, with the goal of visualizing
the expression of marker genes selected from the
DynamicMarkerTable
in each cell cell type. Unlike
iSEEinit()
, which requires us to specify a list of genes,
iSEEmarker()
utilizes the DynamicMarkerTable
that performs statistical testing through the findMarkers()
function from the scran
package. To start exploring the marker genes of each cell type with
iSEE
, we run:
iSEE(sce, initial = initial3)
Visualize a preview of the initial configurations with
view_initial_tiles()
Previously, we successfully generated three distinct initial
configurations for iSEE. However, understanding the expected content of
our iSEE instances is not always straightforward. That’s when we can use
view_initial_tiles()
. We only need as an input the initial
configuration to obtain a graphical visualization of the expected the
corresponding iSEE
instance:
library(ggplot2)
view_initial_tiles(initial = initial1)
view_initial_tiles(initial = results$initial)
Visualize network connections between panels with
view_initial_network()
As some of these panels are linked to each other, we can visualize
these networks with view_initial_network()
. Similar to
iSEEconfigviewer()
, this function takes the initial setup
as input: This function always returns the igraph
object
underlying the visualizations that can be displayed as a side
effect.
library("igraph")
library("visNetwork")
g1 <- view_initial_network(initial1, plot_format = "igraph")
g1
#> IGRAPH a9dabdf DN-- 11 3 --
#> + attr: name (v/c), color (v/c)
#> + edges from a9dabdf (vertex names):
#> [1] ReducedDimensionPlot1->ColumnDataPlot1
#> [2] ReducedDimensionPlot2->ColumnDataPlot1
#> [3] ReducedDimensionPlot3->FeatureAssayPlot3
initial2 <- results$initial
g2 <- view_initial_network(initial2, plot_format = "visNetwork")
Merge different initial configurations with
glue_initials()
Sometimes, it would be interesting to merge different
iSEE
initial configurations to visualize all different
panel in the same iSEE
instance.
merged_config <- glue_initials(initial1,initial2)
We can then preview the content of this initial configuration
view_initial_tiles(merged_config)
Related work
The idea of launching iSEE()
with some specific
configuration is not entirely new, and it was covered in some use cases
by the mode_
functions available in the iSEEu
package. There, the user has access to the following:
-
iSEEu::modeEmpty()
- this will launchiSEE
without any panels, and let you build up the configuration from the scratch. Easy to start, easy to build. -
iSEEu::modeGating()
- this will openiSEE
with multiple chain-linked FeatureExpressionPlot panels, just like when doing some in silico gating. This could be a very good fit if working with mass cytometry data. -
iSEEu::modeReducedDim()
-iSEE
will be ready to compare multiple ReducedDimensionPlot panels, which is a suitable option to compare the views resulting from different embeddings (and/or embeddings generated with slightly different parameter configurations). Themode
s directly launch an instance ofiSEE
, whereas the functionality in iSEEfier is rather oriented to obtain more tailored-to-the-data-at-handinitial
objects, that can subsequently be passed as an argument to theiSEE()
call. We encourage users to submit suggestions about their “classical ways” of usingiSEE
on their data - be that by opening an issue or already proposing a Pull Request on GitHub.
Session info
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Ventura 13.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/Berlin
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] visNetwork_2.1.2 igraph_2.0.3
#> [3] scater_1.33.4 ggplot2_3.5.1
#> [5] scuttle_1.15.4 scRNAseq_2.19.1
#> [7] SingleCellExperiment_1.27.2 SummarizedExperiment_1.35.1
#> [9] Biobase_2.65.1 GenomicRanges_1.57.1
#> [11] GenomeInfoDb_1.41.1 IRanges_2.39.2
#> [13] S4Vectors_0.43.2 BiocGenerics_0.51.1
#> [15] MatrixGenerics_1.17.0 matrixStats_1.4.1
#> [17] iSEEfier_1.1.2 BiocStyle_2.33.1
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.4.1 later_1.3.2 BiocIO_1.15.2
#> [4] bitops_1.0-8 filelock_1.0.3 tibble_3.2.1
#> [7] XML_3.99-0.17 lifecycle_1.0.4 httr2_1.0.4
#> [10] doParallel_1.0.17 lattice_0.22-6 ensembldb_2.29.1
#> [13] alabaster.base_1.5.8 magrittr_2.0.3 sass_0.4.9
#> [16] rmarkdown_2.28 jquerylib_0.1.4 yaml_2.3.10
#> [19] httpuv_1.6.15 DBI_1.2.3 RColorBrewer_1.1-3
#> [22] abind_1.4-8 zlibbioc_1.51.1 Rtsne_0.17
#> [25] AnnotationFilter_1.29.0 RCurl_1.98-1.16 rappdirs_0.3.3
#> [28] circlize_0.4.16 GenomeInfoDbData_1.2.12 ggrepel_0.9.6
#> [31] irlba_2.3.5.1 alabaster.sce_1.5.1 pkgdown_2.1.1
#> [34] iSEEhex_1.7.0 codetools_0.2-20 DelayedArray_0.31.11
#> [37] DT_0.33 tidyselect_1.2.1 shape_1.4.6.1
#> [40] farver_2.1.2 UCSC.utils_1.1.0 viridis_0.6.5
#> [43] ScaledMatrix_1.13.0 shinyWidgets_0.8.7 BiocFileCache_2.13.0
#> [46] GenomicAlignments_1.41.0 jsonlite_1.8.9 BiocNeighbors_1.99.0
#> [49] GetoptLong_1.0.5 iterators_1.0.14 systemfonts_1.1.0
#> [52] foreach_1.5.2 tools_4.4.1 ragg_1.3.3
#> [55] Rcpp_1.0.13 glue_1.7.0 gridExtra_2.3
#> [58] SparseArray_1.5.36 BiocBaseUtils_1.7.3 xfun_0.47
#> [61] mgcv_1.9-1 dplyr_1.1.4 HDF5Array_1.33.6
#> [64] gypsum_1.1.6 shinydashboard_0.7.2 withr_3.0.1
#> [67] BiocManager_1.30.25 fastmap_1.2.0 rhdf5filters_1.17.0
#> [70] fansi_1.0.6 shinyjs_2.1.0 rsvd_1.0.5
#> [73] digest_0.6.37 R6_2.5.1 mime_0.12
#> [76] textshaping_0.4.0 colorspace_2.1-1 listviewer_4.0.0
#> [79] RSQLite_2.3.7 utf8_1.2.4 generics_0.1.3
#> [82] hexbin_1.28.4 FNN_1.1.4.1 rtracklayer_1.65.0
#> [85] httr_1.4.7 htmlwidgets_1.6.4 S4Arrays_1.5.7
#> [88] org.Mm.eg.db_3.19.1 uwot_0.2.2 iSEE_2.17.4
#> [91] pkgconfig_2.0.3 gtable_0.3.5 blob_1.2.4
#> [94] ComplexHeatmap_2.21.0 XVector_0.45.0 htmltools_0.5.8.1
#> [97] bookdown_0.40 ProtGenerics_1.37.1 rintrojs_0.3.4
#> [100] clue_0.3-65 scales_1.3.0 alabaster.matrix_1.5.9
#> [103] png_0.1-8 knitr_1.48 rstudioapi_0.16.0
#> [106] rjson_0.2.23 nlme_3.1-166 curl_5.2.3
#> [109] shinyAce_0.4.2 cachem_1.1.0 rhdf5_2.49.0
#> [112] GlobalOptions_0.1.2 BiocVersion_3.20.0 parallel_4.4.1
#> [115] miniUI_0.1.1.1 vipor_0.4.7 AnnotationDbi_1.67.0
#> [118] restfulr_0.0.15 desc_1.4.3 pillar_1.9.0
#> [121] grid_4.4.1 alabaster.schemas_1.5.0 vctrs_0.6.5
#> [124] promises_1.3.0 BiocSingular_1.21.3 dbplyr_2.5.0
#> [127] iSEEu_1.17.0 beachmat_2.21.6 xtable_1.8-4
#> [130] cluster_2.1.6 beeswarm_0.4.0 evaluate_1.0.0
#> [133] GenomicFeatures_1.57.0 cli_3.6.3 compiler_4.4.1
#> [136] Rsamtools_2.21.1 rlang_1.1.4 crayon_1.5.3
#> [139] ggbeeswarm_0.7.2 fs_1.6.4 viridisLite_0.4.2
#> [142] alabaster.se_1.5.3 BiocParallel_1.39.0 munsell_0.5.1
#> [145] Biostrings_2.73.1 lazyeval_0.2.2 colourpicker_1.3.0
#> [148] Matrix_1.7-0 ExperimentHub_2.13.1 bit64_4.5.2
#> [151] Rhdf5lib_1.27.0 KEGGREST_1.45.1 shiny_1.9.1
#> [154] highr_0.11 alabaster.ranges_1.5.2 AnnotationHub_3.13.3
#> [157] memoise_2.0.1 bslib_0.8.0 bit_4.5.0