During this tutorial, we will integrate the snRNA-seq and snATAC-seq data generated from the human heart samples after myocardial infarction. The integrated data will be used as input for inferring gene regulatory network.

We first download the required data. In this case, we need two Seurat objects with each one corresponding to snRNA-seq and snATAC-seq respectively. The snRNA-seq object includes gene expression data of all fibroblasts and the snATAC-seq includes all chromatin accessibility profiles. Additionally, we also need a gene activity matrix for data integration. This matrix was estimated from the snATAC-seq data by using the ArchR package. The script of cleaning the data and preparing these objects is found here.

Run the following commands to download the data:

mkdir -p Fibroblast
wget -q -P Fibroblast https://costalab.ukaachen.de/open_data/scMEGA/Fibroblast/snRNA.rds
wget -q -P Fibroblast https://costalab.ukaachen.de/open_data/scMEGA/Fibroblast/snATAC.rds
wget -q -P Fibroblast https://costalab.ukaachen.de/open_data/scMEGA/Fibroblast/gene.activity.rds

Next, we load all necessary packages:

Data integration

Let’s load the data into memory and see how they look like

obj.rna <- readRDS("./Fibroblast/snRNA.rds")
obj.atac <- readRDS("./Fibroblast/snATAC.rds")
gene.activity <- readRDS("./Fibroblast/gene.activity.rds")

obj.rna
## An object of class Seurat 
## 28933 features across 45515 samples within 1 assay 
## Active assay: RNA (28933 features, 2000 variable features)
##  4 dimensional reductions calculated: pca, harmony, umap, umap_harmony
obj.atac
## An object of class Seurat 
## 173843 features across 6481 samples within 1 assay 
## Active assay: ATAC (173843 features, 0 variable features)
##  3 dimensional reductions calculated: harmony, umap, umap_harmony

We can observe that there are 45,515 and 6,481 cells in our snRNA-seq and snATAC-seq datasets. We now visualize the data as colored by patients. Note that here we used the UMAP embedding generated from batch-corrected low-dimensional space so that no batch effects are observed from the 2D visualization.

p1 <- DimPlot(obj.rna, pt.size = 1, reduction = "umap_harmony") +
    ggtitle("snRNA-seq")
    
p2 <- DimPlot(obj.atac, pt.size = 1, reduction = "umap_harmony") +
    ggtitle("snATAC-seq")

p1 + p2

Co-embedding

First, we need to project the data into a common low-dimensional space. This is done by using the CCA method from Seurat. To this end, we have wrapped several functions from Seurat into a single function CoembedData.

obj.coembed <- CoembedData(
  obj.rna,
  obj.atac, 
  gene.activity, 
  weight.reduction = "harmony", 
  verbose = FALSE
)
## Performing data integration using Seurat...
## Warning in RunCCA.Seurat(object1 = reference, object2 = query, features =
## features, : Running CCA on different assays
## Finding integration vectors
## Finding integration vector weights
## Transfering 19026 features onto reference data
## Coemebdding the data...
## Centering data matrix
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session

We next visualize the snRNA-seq and snATAC-seq in this shared UMAP space. The cells are colored by patients or modalities.

p1 <- DimPlot(obj.coembed, group.by = "patient", shuffle = TRUE, label = TRUE)
p2 <- DimPlot(obj.coembed, group.by = "tech", shuffle = TRUE, label = TRUE)

p1 + p2

The batch effects between patients, regions and modalities are quite clear. So next we use Harmony to perform batch correction and generate a new UMAP embedding.

obj.coembed <- RunHarmony(
  obj.coembed,
  group.by.vars = c("patient", "region", "tech"),
  reduction = "pca",
  max.iter.harmony = 30,
  dims.use = 1:30,
  project.dim = FALSE,
  plot_convergence = FALSE
)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 2599800)
## Harmony 1/30
## Harmony 2/30
## Harmony 3/30
## Harmony 4/30
## Harmony 5/30
## Harmony 6/30
## Harmony 7/30
## Harmony 8/30
## Harmony 9/30
## Harmony 10/30
## Harmony 11/30
## Harmony 12/30
## Harmony 13/30
## Harmony 14/30
## Harmony 15/30
## Harmony 16/30
## Harmony 17/30
## Harmony 18/30
## Harmony converged after 18 iterations
obj.coembed <- RunUMAP(
  obj.coembed,
  dims = 1:30,
  reduction = 'harmony',
  reduction.name = "umap_harmony",
  reduction.ke = 'umapharmony_',
  verbose = FALSE,
  min.dist = 0.4
)

We can plot the data again

p1 <-
  DimPlot(obj.coembed, group.by = "patient", reduction = "umap_harmony")
p2 <-
  DimPlot(obj.coembed, group.by = "tech", reduction = "umap_harmony")

p1 + p2

From the new UMAP embedding, we can observe that after batch-correction, cells from different patients, regions, and modalities are well mixed.

Based on our previous works of myofibroblast differentiation in human and mouse kidney, we already known some relevant genes for this biological process. For example, SCARA5 is a marker for myofibroblast progenitor, and COL1A1, POSTN, and FN1 are highly expressed in myofibroblast. Therefore we can visualize the expression of these genes to check if we can also identify similar process in human heart. Note that to make the visualization clear, here we used the package Nebulosa to plot the data.

p1 <-
  plot_density(obj.coembed,
               features = "SCARA5",
               reduction = "umap_harmony",
               pal = "magma")
p2 <-
  plot_density(obj.coembed,
               features = "COL1A1",
               reduction = "umap_harmony",
               pal = "magma")
p3 <-
  plot_density(obj.coembed,
               features = "POSTN",
               reduction = "umap_harmony",
               pal = "magma")
p4 <-
  plot_density(obj.coembed,
               features = "FN1",
               reduction = "umap_harmony",
               pal = "magma")


(p1 + p2) / (p3 + p4)

From the visualization, we can observe that some cells highly express SCARA5 which could be the progenitors of myofibroblasts. On the other hand, some cells highly express COL1A1, POSTN, and FN1 and they could be terminally differentiated myofibroblasts.

Sub-clustering

We next perform sub-clustering to identify different populations in our multi-omic fibroblast data. To further control the data quality, here we will use a two-round approach to remove low-quality cells. We first use a high-resolution to get a large amount of clusters.

obj.coembed <- FindNeighbors(obj.coembed, reduction = "harmony", dims = 1:30)
## Computing nearest neighbor graph
## Computing SNN
obj.coembed <- FindClusters(obj.coembed, resolution = 0.9, verbose = FALSE)

cols <- ArchR::paletteDiscrete(obj.coembed@meta.data[, "RNA_snn_res.0.9"])
    
p <- DimPlot(obj.coembed, group.by = "RNA_snn_res.0.9", label = TRUE,
             reduction = "umap_harmony", shuffle = TRUE) +
    scale_color_manual(values = cols) +
    xlab("UMAP1") + ylab("UMAP2")
    
p

We can use the function CellPropPlot to visualize the cell propotion across all patients.

p <- CellPropPlot(obj.coembed,
                  group.by = "RNA_snn_res.0.9",
                  prop.in = "patient_region_id",
                  cols = cols)

p

Next, we identify the markers for each cluster and visualize the top 3.

all.markers <- FindAllMarkers(obj.coembed, 
                              only.pos = TRUE, 
                              min.pct = 0.25, logfc.threshold = 0.5)
## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
## Calculating cluster 6
## Calculating cluster 7
## Calculating cluster 8
## Calculating cluster 9
## Calculating cluster 10
## Calculating cluster 11
## Calculating cluster 12
## Calculating cluster 13
## Calculating cluster 14
## Calculating cluster 15
df <- all.markers %>%
    group_by(cluster) %>%
    slice_max(n = 3, order_by = avg_log2FC)

p <- DotPlot(obj.coembed, features = unique(df$gene)) + RotatedAxis()

print(p)

saveRDS(obj.coembed, "./Fibroblast/coembed.rds")

The above dot plot demonstrates the top 3 markers per cluster and we can easily classify cluster 3 and 7 as myofibroblasts. In addition, two clusters (i.e., cluster 2 and 5) are marked by RYR2, a well known marker gene for cardiomyocyte, likely due to the background noise. It is worth pointing out that in another study Cells of the adult human heart a similar fibroblast sub-population (marked by another cardiomyocyte-specific gene TNNT2) was also identified. Based on these, we decide to remove these two clusters.

Idents(obj.coembed) <- "RNA_snn_res.0.9"
coembed.sub <- subset(obj.coembed, idents = c(2, 5), invert = TRUE)
coembed.sub
## An object of class Seurat 
## 221802 features across 43276 samples within 3 assays 
## Active assay: RNA (28933 features, 2000 variable features)
##  2 other assays present: ATAC, GeneActivity
##  4 dimensional reductions calculated: pca, umap, harmony, umap_harmony
cols.clusters <- ArchR::paletteDiscrete(coembed.sub@meta.data[, "RNA_snn_res.0.9"])

p <- DimPlot(coembed.sub, group.by = "RNA_snn_res.0.9", label = TRUE,
             reduction = "umap_harmony", shuffle = TRUE, cols = cols) +
    xlab("UMAP1") + ylab("UMAP2")

p

We then re-do the UMAP embedding and clustering with a lower resolution to reduce complexity.

coembed.sub <- RunUMAP(coembed.sub, 
                       dims = 1:30, 
                       reduction = 'harmony',
                       reduction.name = "umap_harmony",
                       reduction.key = 'umap_harmony_',
                       verbose = FALSE,
                       min.dist = 0.4)
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from umap_harmony_ to umapharmony_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to umapharmony_
## re-clustering
coembed.sub <- FindNeighbors(coembed.sub, reduction = "harmony", dims = 1:30)
## Computing nearest neighbor graph
## Computing SNN
coembed.sub <- FindClusters(coembed.sub, resolution = 0.2, verbose = FALSE)

cols <- ArchR::paletteDiscrete(coembed.sub@meta.data[, "RNA_snn_res.0.2"])
    
p <- DimPlot(coembed.sub, group.by = "RNA_snn_res.0.2", label = TRUE,
             reduction = "umap_harmony", shuffle = TRUE) +
    scale_color_manual(values = cols) +
    xlab("UMAP1") + ylab("UMAP2")
    
p                

Marker genes are identified based on new clustering results and we can plot the top 10 markers.

all.markers <- FindAllMarkers(coembed.sub, 
                              only.pos = TRUE, 
                              min.pct = 0.25, logfc.threshold = 0.5)
## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
df <- all.markers %>%
    group_by(cluster) %>%
    slice_max(n = 10, order_by = avg_log2FC)

p <- DotPlot(coembed.sub, features = unique(df$gene)) + RotatedAxis()
## Warning: Scaling data with a low number of groups may produce misleading results

Now the clustering results are clearer, and cluster 1 is myofibroblast, and cluster 2 is SCARA5+ fibroblast.

We can plot the snRNA-seq and snATAC-seq data separately

p <- DimPlot(coembed.sub, group.by = "RNA_snn_res.0.2", label = TRUE,
             reduction = "umap_harmony", shuffle = TRUE, split.by = "tech", 
             cols = cols) +
    xlab("UMAP1") + ylab("UMAP2")

p

Visualize the cell proportion of each sub-population across all patients

p <- CellPropPlot(coembed.sub, 
                   group.by = "RNA_snn_res.0.2", 
                   prop.in = "patient_region_id", 
                   cols = cols)

p

Since we have annotated all patients into three major groups, i.e., myogenic, ischmeic, and fibrotic. we can also perform statistical test to check if any sub-population are enriched in any of the above group. This can be done by the function CompareCellProp.

coembed.sub$patient_group <- factor(coembed.sub$patient_group, 
                                    levels = c("myogenic", "ischemic", "fibrotic"))

p <- CompareCellProp(object = coembed.sub, 
                     group.by = "RNA_snn_res.0.2", 
                     prop.in = "patient_region_id", 
                      sample.annotation = "patient_group",
                    comparisons = list(c("myogenic", "ischemic"),
                                       c("ischemic", "fibrotic"),
                                       c("myogenic", "fibrotic")))

p
## Warning in wilcox.test.default(c(0.1147124245313, 0.121564482029598,
## 0.120168067226891, : cannot compute exact p-value with ties

saveRDS(coembed.sub, "./Fibroblast/coembed.cluster.rds")

This analysis reveals significant changes of cell proportion between different conditions for cluster 1, 2, and 3. We therefore only use these three clusters in our trajectory analysis.

Idents(coembed.sub) <- "RNA_snn_res.0.2"
coembed.sub <- subset(coembed.sub, idents = c(1, 2, 3))

We can generate a new UMAP

cols <- ArchR::paletteDiscrete(unique(coembed.sub@meta.data[, "RNA_snn_res.0.2"]))

coembed.sub <- RunUMAP(coembed.sub, 
               dims = 1:30, 
               reduction = 'harmony',
               reduction.name = "umap_harmony",
               reduction.ke = 'umap_harmony_',
              verbose = FALSE)
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from umap_harmony_ to umapharmony_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to umapharmony_
p <- DimPlot(coembed.sub, group.by = "RNA_snn_res.0.2", label = TRUE,
             reduction = "umap_harmony", shuffle = TRUE, cols = cols) +
    xlab("UMAP1") + ylab("UMAP2")

p

Save data

saveRDS(coembed.sub, "./Fibroblast/coembed.sub.rds")

Trajectory analysis

We next identify the trajectory for myofibroblast differentiation.

Dimensionality reduction

To infer trajectory, we will perform dimension reduction using diffusion map via the function RunDiffusionMap. This is based on the R package destiny.

coembed.sub <- RunDiffusionMap(coembed.sub, reduction = "harmony")
## finding knns......done. Time: 268.62s
## Calculating transition probabilities......done. Time: 0.27s
## 
## performing eigen decomposition......done. Time: 1.63s
cols <- ArchR::paletteDiscrete(coembed.sub@meta.data[, "RNA_snn_res.0.2"])

p <- DimPlot(coembed.sub, group.by = "RNA_snn_res.0.2", label = TRUE,
             reduction = "dm", shuffle = TRUE, cols = cols) +
    xlab("DC 1") + ylab("DC 2")

p

We can also plot snATAC-seq and snRNA-seq individually

DimPlot(coembed.sub, reduction = "dm", 
        group.by = "RNA_snn_res.0.2", split.by = "tech", cols = cols)

Cell pairing

Next, we match the cells between these two modalities. In other words, for each cell in, for example, snATAC-seq, we will find a cell from snRNA-seq data so that these two cells have the similar profiles. This is only necessary when each modality was performed independently. If snRNA-seq and snATAC-seq data was generated by multi-modal protocol, e.g., 10X multiome or SHARE-seq, this step can be skipped.

We here use the method proposed by Kartha, Vinay K., et al. to match the cells.

df.pair <- PairCells(object = coembed.sub, reduction = "harmony",
                    pair.by = "tech", ident1 = "ATAC", ident2 = "RNA")
## Getting dimensional reduction data for pairing cells...
## Pairing cells using geodesic mode...
## Constructing KNN graph for computing geodesic distance ..
## Computing graph-based geodesic distance ..
## # KNN subgraphs detected: 1
## Skipping subgraphs with either ATAC/RNA cells fewer than: 50
## Pairing cells for subgraph No.1
## Total ATAC cells in subgraph: 3397
## Total RNA cells in subgraph: 22840
## Subgraph size: 3397
## Search threshold being used: 1359
## Constructing KNN based on geodesic distance to reduce search pairing search space
## Number of cells being paired: 3397 ATAC and 3397  RNA cells
## Determing pairs through optimized bipartite matching ..
## Assembling pair list ..
## Finished!

We can visualize the paired cells

sel_cells <- c(df.pair$ATAC, df.pair$RNA)
coembed.sub2 <- coembed.sub[, sel_cells]

options(repr.plot.height = 5, repr.plot.width = 10)
DimPlot(coembed.sub2, reduction = "dm", 
        group.by = "RNA_snn_res.0.2", split.by = "tech", cols = cols)

We next create a new Seurat object for there paired cells as if they are generated by single-cell multimodal protocol.

obj.pair <- CreatePairedObject(df.pair = df.pair, 
                               object = coembed.sub2,
                               use.assay1 = "RNA", 
                               use.assay2 = "ATAC")
## Merging objects...
obj.pair
## An object of class Seurat 
## 202776 features across 3397 samples within 2 assays 
## Active assay: RNA (28933 features, 0 variable features)
##  1 other assay present: ATAC
##  5 dimensional reductions calculated: pca, umap, harmony, umap_harmony, dm

Finally, we infer a pseudo-time trajectory from SCARA5+ fibroblasts to myofibroblast using the approach from ArchR. Here we modified the function to allow to take a Seurat object as input

obj.pair <- AddTrajectory(object = obj.pair, 
                          trajectory = c(2, 1),
                          group.by = "RNA_snn_res.0.2", 
                          reduction = "dm",
                          dims = 1:3, 
                          use.all = FALSE)
                          
# we only plot the cells that are in this trajectory
obj <- obj.pair[, !is.na(obj.pair$Trajectory)]

p <- TrajectoryPlot(object = obj, 
                    reduction = "dm",
                    continuousSet = "blueYellow",
                    size = 1,
                   addArrow = FALSE)

p

TF and gene selection

We next select candidate TFs and genes for building a meaningful gene regulatory network.

Select TFs

To identify potential regulator (i.e., TFs), we first estimate an acitivty score for each TF in each cell. This is done by first performing motif matching and then computing deviation scores using chromVAR.

# Get a list of motif position frequency matrices from the JASPAR database
pfm <- getMatrixSet(
  x = JASPAR2020,
  opts = list(collection = "CORE", tax_group = 'vertebrates', all_versions = FALSE)
)

# add motif information
obj <- AddMotifs(
  object = obj,
  genome = BSgenome.Hsapiens.UCSC.hg38,
  pfm = pfm,
    assay = "ATAC"
)
## Building motif matrix
## Finding motif positions
## Creating Motif object
obj <- RunChromVAR(
  object = obj,
  genome = BSgenome.Hsapiens.UCSC.hg38,
    assay = "ATAC"
)
## Computing GC bias per region
## Selecting background regions
## Computing deviations from background
## Constructing chromVAR assay
res <- SelectTFs(object = obj, return.heatmap = TRUE)
## Find 456 shared features!
df.cor <- res$tfs
ht <- res$heatmap

We can visualize the TF activity dynamic along the trajectory

draw(ht)

Select genes

We will select relevant genes by first linking genes to peaks based on the corrleation between gene expression from the snRNA-seq data and peak accessibility from the snATAC-seq data along the inferred trajectory. This means that we only consider a gene to be a potential target if it can be assocaited to at least one peak.

res <- SelectGenes(object = obj,
                  labelTop1 = 0,
                  labelTop2 = 0)
## Creating Trajectory Group Matrix..
## Smoothing...
## Creating Trajectory Group Matrix..
## Smoothing...
## Linking cis-regulatory elements to genes...
df.p2g <- res$p2g
ht <- res$heatmap

draw(ht)

Gene regulatory network inference

We here will try to predict a gene regulatory network based on the correlation of the TF binding activity as estimated from snATAC-seq and gene expression as measured by snRNA-seq along the trajectory.

tf.gene.cor <- GetTFGeneCorrelation(object = obj, 
                                    tf.use = df.cor$tfs, 
                                    gene.use = unique(df.p2g$gene),
                                    tf.assay = "chromvar", 
                                    gene.assay = "RNA",
                                    trajectory.name = "Trajectory")
## Creating Trajectory Group Matrix..
## Some values are below 0, this could be the Motif activity matrix in which scaleTo should be set = NULL.
## Continuing without depth normalization!
## Smoothing...
## Creating Trajectory Group Matrix..
## Smoothing...

We can then visualize this correlation matrix by heatmap. Also, we can cluster the genes and TFs to identify different regulatory modules for the predefined sub-populations.

ht <- GRNHeatmap(tf.gene.cor, 
                 tf.timepoint = df.cor$time_point)
## `use_raster` is automatically set to TRUE for a matrix with more than
## 2000 rows. You can control `use_raster` argument by explicitly setting
## TRUE/FALSE to it.
## 
## Set `ht_opt$message = FALSE` to turn off this message.
ht

To associate genes to TFs, we will use the peak-to-gene links and TF binding sites information. Specifically, if a gene is regulated by a peak and this peak is bound by a TF, then we consider this gene to be a target of this TF.

motif.matching <- obj@assays$ATAC@motifs@data
colnames(motif.matching) <- obj@assays$ATAC@motifs@motif.names
motif.matching <-
    motif.matching[unique(df.p2g$peak), unique(tf.gene.cor$tf)]


df.grn <- GetGRN(motif.matching = motif.matching, 
                 df.cor = tf.gene.cor, 
                 df.p2g = df.p2g)
## Filtering network by peak-to-gene links...
## Filtering network by TF binding site prediction...

Next, we can visualize our network as the last step of this analysis

# define colors for nodes representing TFs (i.e., regulators)
df.cor <- df.cor[order(df.cor$time_point), ]
tfs.timepoint <- df.cor$time_point
names(tfs.timepoint) <- df.cor$tfs

# plot the graph, here we can highlight some genes
df.grn2 <- df.grn %>%
    subset(correlation > 0.4) %>%
    select(c(tf, gene, correlation)) %>%
    rename(weights = correlation)

saveRDS(df.grn2,"./Fibroblast/final_grn.Rds")    

p <- GRNPlot(df.grn2, 
             tfs.timepoint = tfs.timepoint,
             show.tf.labels = TRUE,
             seed = 42, 
             plot.importance = FALSE,
            min.importance = 2,
            remove.isolated = FALSE)

options(repr.plot.height = 20, repr.plot.width = 20)

print(p)
## Warning: Using alpha for a discrete variable is not advised.

GRN visualization

GRN along the trajectory

Once we generated the gene regulatory network, we can visualize individual TFs in terms of binding activity, expression, and target expression along the pseudotime trajectory.

Here we select two TFs for visualization.

obj <- AddTargetAssay(object = obj, df.grn = df.grn2)
## Warning in if (is.na(df.grn)) {: the condition has length > 1 and only the first
## element will be used
p1 <- PseudotimePlot(object = obj, tf.use = "NR3C2")
p2 <- PseudotimePlot(object = obj, tf.use = "RUNX1")

p1 + p2
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

The x-axis in above plots present pseudotime point along the trajectory, and the y-axis represent TF binding acitivty, TF expression, and TF target expression after z-score transformation.

GRN in space

As an additional step, if the spatial transcriptome (ST) data (e.g., generated by using 10X Visium assay) are available, we can then visualize the target gene expression for some interesting TFs to understand the regulon activity in spatial space. First let’s download the ST data

mkdir -p VisiumSpatial
wget -q -P VisiumSpatial https://costalab.ukaachen.de/open_data/scMEGA/VisiumSpatial/AKK004_157772.rds

Of note, each spot in the 10X Visium assay contains multiple cells, meaning that we usually cannot identify cell types by clustering the data. To address this issue, people developed various computation tools to estimate cell-type-specific abundance in each spot. Here we have perform such analysis using cell2location based on our snRNA-seq data. Let’s first visualize the fibroblast proportion.

obj.spatial <- readRDS("./VisiumSpatial/AKK004_157772.rds")

DefaultAssay(obj.spatial) <- "c2l_props"

p <- SpatialFeaturePlot(obj.spatial, features = "Fib", min.cutoff = "q5",
                       max.cutoff = "q95") +
    scale_fill_viridis(option = "D") +
    ggtitle("") +
    labs(fill = "") +
    theme(legend.position = "bottom")
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.

We next can check the gene expression in space. Here, we use the above TFs.

DefaultAssay(obj.spatial) <- "SCT"

p1 <- SpatialFeaturePlot(object = obj.spatial, features = "NR3C2") +
    ggtitle("") +
    theme(legend.position = "bottom") +
scale_fill_viridis(option = "C")
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
p2 <- SpatialFeaturePlot(object = obj.spatial, features = "RUNX1") +
    ggtitle("") +
    theme(legend.position = "bottom") +
scale_fill_viridis(option = "C")
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
p1 + p2

Then, we visualize the regulon activity of NR3C2 and RUNX1 by using the GRN that we predicted. The major idea is that if the target genes of a TF are up-regulated, then this TF should have high activity. This principle has been used to develop computation methods to estimate TF activity from gene expression data, such as DoRothEA.

DefaultAssay(obj.spatial) <- "SCT"

p1 <- GRNSpatialPlot(object = obj.spatial, assay = "SCT", df.grn = df.grn2,
                   tf.use = "NR3C2", vis.option = "B") +
    ggtitle("") +
    theme(legend.position = "bottom") 
## Warning: The following features are not present in the object: CATSPERB, GALR1,
## LMX1A, TEX26-AS1, UGDH-AS1, ZBTB20-AS5, not searching for symbol synonyms
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
p2 <- GRNSpatialPlot(object = obj.spatial, assay = "SCT", df.grn = df.grn2,
                   tf.use = "RUNX1", vis.option = "B") +
    ggtitle("") +
    theme(legend.position = "bottom")
## Warning: The following features are not present in the object: CNTNAP2, DEC1,
## HMGA2, MIR181A1HG, RFX8, TPRG1, VEPH1, not searching for symbol synonyms
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
p1 + p2

The above plots demonstrate a clear spatial-distributed pattern for the regulon activity of NR3C2 and RUNX1.

Save data

saveRDS(obj, "./Fibroblast/coembed.trajectory.rds")

Network analysis

As an addition step, we next will explore the gene regulatory networks generated by scMEGA in fibroblasts during myocardial infarction by using network analysis methods.

This will allow us to

  • explore regulators regarding their network topological properties
  • to vizualise regulatory networks centred in selected transcription factors.

We first load the network produced by scMEGA:

dfgrn <- readRDS("./Fibroblast/final_grn.Rds")
netobj <- graph_from_data_frame(dfgrn,directed = TRUE)
V(netobj)$type <- ifelse(V(netobj)$name %in% dfgrn$tf,"TF/Gene","Gene")

Next, we will estimate distint network topological measures as node degree, node centrality and node page rank score and plot these in a principal component based embedding with a focus on transcription factors. We plot bellow the 3 first PCs.

Network topological measures embedding: PC1 and PC2

p <- TopEmbGRN(df.grn=netobj)
## Warning: ggrepel: 1783 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Network topological measures embedding: PC2 and PC3

p <- TopEmbGRN(df.grn=netobj,axis=c(2,3))
## Warning: ggrepel: 1800 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

This reveals that RUNX1 is a central TF in this network, as it has high in/out-degree values and high mediator score.

RUNX1 centric network.

To further inspect genes and TFs associated with RUNX1, we create a centric layout embedding.

NetCentPlot(netobj,"RUNX1")

This network reveals direct targets of RUNX1 such as TEAD4 and HIF1A and indirect targets as TEAD1 and TEAD2. If you want, you can explore a highlight parameter to personalize the plot by only showing the desired genes/TFs. By default, the TFs with a betweenness score higher than an average is depicted in the plot.

# Check session information
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Rocky Linux 8.6 (Green Obsidian)
## 
## Matrix products: default
## BLAS/LAPACK: /home/rs619065/miniconda3/envs/r-4.1/lib/libopenblasp-r0.3.18.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] circlize_0.4.15                   ComplexHeatmap_2.10.0            
##  [3] ggraph_2.0.6                      igraph_1.3.4                     
##  [5] TFBSTools_1.32.0                  JASPAR2020_0.99.10               
##  [7] dplyr_1.0.9                       BSgenome.Hsapiens.UCSC.hg38_1.4.4
##  [9] BSgenome_1.62.0                   rtracklayer_1.54.0               
## [11] Biostrings_2.62.0                 XVector_0.34.0                   
## [13] Nebulosa_1.4.0                    patchwork_1.1.1                  
## [15] harmony_0.1.0                     scMEGA_0.2.0                     
## [17] Signac_1.6.0                      sp_1.5-0                         
## [19] SeuratObject_4.1.0                Seurat_4.1.1                     
## [21] rhdf5_2.38.1                      SummarizedExperiment_1.24.0      
## [23] Biobase_2.54.0                    MatrixGenerics_1.6.0             
## [25] Rcpp_1.0.9                        Matrix_1.4-1                     
## [27] GenomicRanges_1.46.1              GenomeInfoDb_1.30.1              
## [29] IRanges_2.28.0                    S4Vectors_0.32.4                 
## [31] BiocGenerics_0.40.0               matrixStats_0.62.0               
## [33] data.table_1.14.2                 stringr_1.4.0                    
## [35] plyr_1.8.7                        magrittr_2.0.3                   
## [37] ggplot2_3.3.6                     gtable_0.3.0                     
## [39] gtools_3.9.3                      gridExtra_2.3                    
## [41] ArchR_1.0.2                      
## 
## loaded via a namespace (and not attached):
##   [1] vcd_1.4-9                   ica_1.0-3                  
##   [3] RcppRoll_0.3.0              class_7.3-20               
##   [5] Rsamtools_2.10.0            foreach_1.5.2              
##   [7] lmtest_0.9-40               rprojroot_2.0.3            
##   [9] crayon_1.5.1                laeken_0.5.2               
##  [11] spatstat.core_2.4-4         MASS_7.3-57                
##  [13] rhdf5filters_1.6.0          nlme_3.1-157               
##  [15] backports_1.4.1             qlcMatrix_0.9.7            
##  [17] rlang_1.0.4                 ROCR_1.0-11                
##  [19] irlba_2.3.5                 limma_3.50.3               
##  [21] smoother_1.1                BiocParallel_1.28.3        
##  [23] rjson_0.2.21                CNEr_1.30.0                
##  [25] bit64_4.0.5                 glue_1.6.2                 
##  [27] poweRlaw_0.70.6             sctransform_0.3.3          
##  [29] parallel_4.1.1              chromVAR_1.16.0            
##  [31] spatstat.sparse_2.1-1       AnnotationDbi_1.56.2       
##  [33] spatstat.geom_2.4-0         tidyselect_1.1.2           
##  [35] textshape_1.7.3             fitdistrplus_1.1-8         
##  [37] XML_3.99-0.9                tidyr_1.2.0                
##  [39] zoo_1.8-10                  ggpubr_0.4.0               
##  [41] GenomicAlignments_1.30.0    xtable_1.8-4               
##  [43] RcppHNSW_0.3.0              evaluate_0.16              
##  [45] cli_3.3.0                   zlibbioc_1.40.0            
##  [47] miniUI_0.1.1.1              bslib_0.4.0                
##  [49] rpart_4.1.16                fastmatch_1.1-3            
##  [51] RcppEigen_0.3.3.9.2         shiny_1.7.2                
##  [53] xfun_0.32                   clue_0.3-60                
##  [55] cluster_2.1.3               caTools_1.18.2             
##  [57] tidygraph_1.2.1             pcaMethods_1.86.0          
##  [59] KEGGREST_1.34.0             tibble_3.1.8               
##  [61] ggrepel_0.9.1               listenv_0.8.0              
##  [63] TFMPvalue_0.0.8             png_0.1-7                  
##  [65] future_1.27.0               withr_2.5.0                
##  [67] lsa_0.73.3                  bitops_1.0-7               
##  [69] slam_0.1-50                 ggforce_0.3.3              
##  [71] ranger_0.13.1               sparsesvd_0.2              
##  [73] e1071_1.7-9                 pracma_2.3.8               
##  [75] pillar_1.8.0                GlobalOptions_0.1.2        
##  [77] cachem_1.0.6                fs_1.5.2                   
##  [79] scatterplot3d_0.3-41        TTR_0.24.3                 
##  [81] GetoptLong_1.0.5            xts_0.12.1                 
##  [83] vctrs_0.4.1                 ellipsis_0.3.2             
##  [85] generics_0.1.3              tools_4.1.1                
##  [87] munsell_0.5.0               tweenr_1.0.2               
##  [89] proxy_0.4-26                DelayedArray_0.20.0        
##  [91] fastmap_1.1.0               compiler_4.1.1             
##  [93] abind_1.4-5                 httpuv_1.6.5               
##  [95] plotly_4.10.0               rgeos_0.5-9                
##  [97] GenomeInfoDbData_1.2.7      lattice_0.20-45            
##  [99] deldir_1.0-6                utf8_1.2.2                 
## [101] later_1.3.0                 jsonlite_1.8.0             
## [103] ggplot.multistats_1.0.0     scales_1.2.0               
## [105] docopt_0.7.1                pbapply_1.5-0              
## [107] carData_3.0-5               lazyeval_0.2.2             
## [109] nabor_0.5.0                 promises_1.2.0.1           
## [111] doParallel_1.0.17           car_3.1-0                  
## [113] R.utils_2.11.0              goftest_1.2-3              
## [115] spatstat.utils_2.3-1        reticulate_1.25            
## [117] rmarkdown_2.14              pkgdown_2.0.3              
## [119] rlemon_0.2.0                cowplot_1.1.1              
## [121] textshaping_0.3.6           Rtsne_0.16                 
## [123] uwot_0.1.11                 survival_3.3-1             
## [125] yaml_2.3.5                  systemfonts_1.0.4          
## [127] htmltools_0.5.3             memoise_2.0.1              
## [129] BiocIO_1.4.0                graphlayouts_0.8.1         
## [131] destiny_3.9.1               viridisLite_0.4.0          
## [133] digest_0.6.29               assertthat_0.2.1           
## [135] mime_0.12                   RSQLite_2.2.14             
## [137] future.apply_1.9.0          blob_1.2.3                 
## [139] R.oo_1.24.0                 ragg_1.2.2                 
## [141] splines_4.1.1               labeling_0.4.2             
## [143] Rhdf5lib_1.16.0             RCurl_1.98-1.6             
## [145] broom_1.0.0                 ks_1.13.5                  
## [147] hms_1.1.1                   colorspace_2.0-3           
## [149] shape_1.4.6                 nnet_7.3-17                
## [151] sass_0.4.2                  mclust_5.4.10              
## [153] RANN_2.6.1                  mvtnorm_1.1-3              
## [155] ggseqlogo_0.1               fansi_1.0.3                
## [157] tzdb_0.3.0                  VIM_6.1.1                  
## [159] parallelly_1.32.1           SnowballC_0.7.0            
## [161] R6_2.5.1                    factoextra_1.0.7           
## [163] ggridges_0.5.3              lifecycle_1.0.1            
## [165] curl_4.3.2                  ggsignif_0.6.3             
## [167] leiden_0.4.2                motifmatchr_1.16.0         
## [169] jquerylib_0.1.4             robustbase_0.95-0          
## [171] desc_1.4.1                  RcppAnnoy_0.0.19           
## [173] RColorBrewer_1.1-3          iterators_1.0.14           
## [175] htmlwidgets_1.5.4           polyclip_1.10-0            
## [177] purrr_0.3.4                 seqLogo_1.60.0             
## [179] mgcv_1.8-40                 globals_0.16.0             
## [181] spatstat.random_2.2-0       progressr_0.10.1           
## [183] codetools_0.2-18            GO.db_3.14.0               
## [185] SingleCellExperiment_1.16.0 RSpectra_0.16-1            
## [187] R.methodsS3_1.8.1           DBI_1.1.2                  
## [189] tensor_1.5                  httr_1.4.3                 
## [191] highr_0.9                   KernSmooth_2.23-20         
## [193] stringi_1.7.8               reshape2_1.4.4             
## [195] farver_2.1.1                annotate_1.72.0            
## [197] viridis_0.6.2               ggthemes_4.2.4             
## [199] hexbin_1.28.2               optmatch_0.10.4            
## [201] magick_2.7.3                DT_0.24                    
## [203] boot_1.3-28                 restfulr_0.0.13            
## [205] readr_2.1.2                 scattermore_0.8            
## [207] DEoptimR_1.0-11             bit_4.0.4                  
## [209] spatstat.data_2.2-0         pkgconfig_2.0.3            
## [211] DirichletMultinomial_1.36.0 knn.covertree_1.0          
## [213] rstatix_0.7.0               knitr_1.39