TIPS Tutorial

Introduction

1. Purpose of TIPS (Fig 1)

We developed TIPS (Transcriptional Instability Prediction of Subnetworks) to better understand birth defects by (i) testing the “controlled chaoticity” hypothesis at the transition (ordered connectivity + targeted fragility), and (ii) prioritizing mechanistic bridge/hub genes and downstream “dual‑pull” architecture underlying the developmental bifurcation.

  1. test the “controlled chaoticity” hypothesis at the transition (ordered connectivity + targeted fragility), and
  2. prioritize mechanistic bridge/hub genes and downstream “dual-pull” structure for the CP → (CM vs CF-like) bifurcation.
Controlled Chaoticity

Figure 1: A cell begins in a single stable state. As new fate programs emerge, their competing influences pull the system into a highly dynamic transition region where multiple states are explored. The balls and hills represent changes in the underlying potential landscape that shape how easily the cell can move between states. The cell then settles into one of several stable phenotypic outcomes.

2. TIPS workflow: continuing from BioTIP (Fig 2)

TIPS Pipeline

Figure 2: How BioTIP and TIPS analysis fit into overall cell-state discovery and differentiation trajectory pipeline.

3. TIPS two-step analytic strategy and key concepts (Fig 3)

There are two steps:

In Step A, curated, knowledge based interactome is restricted to genes in a cluster specific signature to form a signature derived subnetwork. STRING interaction support is then combined with cluster specific coexpression information to assign state specific edge weights, yielding a weighted protein–protein interaction network (PPIN) for each cell state, where genes are nodes and weighted interactions represent a combination of protein–protein associations and coexpression specificity.

In Step B, the weighted PPINs are analyzed for topological properties and robustness via strategic node removal (red arrows). Robustness is quantified and compared among three categories of PPINs: stable (HiG), transitional (CTS), and their overlap (CTS&HiG) networks.

TIPS edge weighting highlights putative regulatory interactions by downweighting spurious correlations:
PPI Network Two-Step Analysis

Figure 3: Overview of TIPS analysis including how PPINS are constructed, how edges are updated, and how robustness is tested through targeted node attacks.

3. Why walkthrough Pijuan-Sala 2019 mouse gastrulation dataset (the mE8.25 subset)

Specifically, raw counts and metadata were obtained from the MouseGastrulationData R package. We extracted E8.25 cells and removed cells annotated as putative doublets or stripped nuclei as described in our prior reanalysis (Yang et al., 2022), yielding 7,240 E8.25 mesoderm cells.

Counts were normalized with scran::logNormCounts using sample-specific size factors. Gene selection and clustering followed Yang et al. (2022): PCA was performed on the selected expressed-gene set, and clusters were identified using an SNN graph (k = 10) and graph-based clustering, resulting in 19 subpopulations.

Cluster markers were identified using SingleCellExperiment::findMarkers (t-test; logFC > 0.6; min.prop = 0.25; FDR < 0.01).

Following, we detail the main delivery of TIPS analysis presented in the following Figure by walking through the R set up, and each panel – the purpose, R code, interpretation, inputs and outputs.
 
 

TIPS Walkthrough

Whole Figure

Main Figure: Polished figure generated by TIPS workflow.

Setup

Load libraries & initialize global variables

library(gplots)
require(dplyr)
library(data.table)
library(ggplot2)
library(ggpubr)
library(ggrepel)
library(igraph)
library(rstatix)
library(scales)
library(pracma)
library(MLmetrics)
library(sm)

########## BEGINNING OF USER INPUT ##########

wd = "/Users/felixyu/Documents/GSE87038_weighted/" # working directory
celltype_specific_weight_version <- '10'
source(paste0('https://raw.githubusercontent.com/xyang2uchicago/TIPS/refs/heads/main/R/celltype_specific_weight_v', celltype_specific_weight_version, '.R')) # source from GitHub or use local file

db <- "GSE87038" # dataset name for file naming

PPI_color_palette = c("CTS" = "#7570B3", "HiGCTS" = "#E7298A", "HiG" = "#E6AB02") # Coloring for HiG, HiGCTS, and HiG clusters
PPI_size_palette = c("CTS" = 1, "HiGCTS" = 0.75, "HiG" = 0.25) # Line width for HiG, HiGCTS, and HiG clusters

CT_id = c("7", "8", "11", "13", "15", "16", "16.1") # critical transition cluster ids
CT_id_formatted <- paste0("_(", paste(CT_id, collapse = "|"), ")")


setwd(paste0(wd, 'results/PPI_weight/'))
inputdir = "../../data/" # directory for input data files (e.g. CHD_Cilia_Genelist.rds)

# For Figure E
# Choose what to plot: "vertex", "edge", or "both"
plot_mode <- "vertex"  # change this to "edge" or "both" as needed

# For Figure K
CP_CTS <- "HiGCTS_8" # cardiac progenitor critical transition cluster

s = "combined" # specificity method

########## END OF USER INPUT ##########

file = paste0(db, '_STRING_graph_perState_simplified_',s,'weighted.rds')
graph_list <- readRDS(file)  
    
(names(graph_list))
#  [1] "HiG_1"       "HiG_2"       "HiG_3"       "HiG_4"       "HiG_5"       "HiG_6"       "HiG_9"       "HiG_10"      "HiG_12"      "HiG_14"      "HiG_17"      "HiG_18"      "HiG_19"      "HiG_7"      
# [15] "HiG_11"      "HiG_15"      "HiG_16"      "HiG_13"      "HiG_8"       "HiGCTS_7"    "HiGCTS_11"   "HiGCTS_15"   "HiGCTS_16"   "HiGCTS_16.1" "HiGCTS_8"    "CTS_7"       "CTS_11"      "CTS_15"     
# [29] "CTS_16"      "CTS_16.1"    "CTS_13"      "CTS_8"       "HiGCTS_13"

signature_levels = c(names(graph_list))

Panel A & Panel B — Strength Architecture and Distribution

Purpose

To evaluate the weighted connectivity structure of each PPIN derived from GSE87038 signatures. Normalized node strength reflects the sum of all weighted interactions per gene, adjusted for network size. These panels describe both the typical strength (medians) and the full distribution across categories (CTS, CTS&HiG, HiG).

What this Panel Shows

Interpretation

Despite being a much smaller overlap set, CTS&HiG PPINS contain higher normalized connection strength between nodes.

Inputs

Outputs

Panel A and Panel B

Note: figures shown are polished versions of the raw outputs

Code

Top
###################################################
# Fig A ) normalized node strength analysis
# original code: 11.3_CTS_cardiac_network_ANND_pagerank.R
# original pdf: normalized.node.strength_GSE87038_v2.pdf
################################################################
{
df = readRDS(file='df_PAGERANK_strength_ANND.rewring.P.rds')  #!!!!!!!!!!!!!!!!!!!!!!!
df = rbind(subset(df, PPI_cat=='CTS'),
                    subset(df, PPI_cat=='HiGCTS'),
                    subset(df, PPI_cat=='HiG')
                    )
df$label=df$gene

CHD = readRDS( file=paste0(inputdir, 'CHD_Cilia_Genelist.rds'))

df_median = df %>% group_by(signature) %>%
                    summarise(median_normalized_strength = median(normalized.strength, na.rm = TRUE))
df_median$PPI_cat = lapply(df_median$signature, function(x) unlist(strsplit(x, split='_'))[1]) %>% unlist 
df_median$PPI_cat = factor(df_median$PPI_cat,levels=c('CTS', 'HiGCTS', 'HiG')) 				
    
violin_median_normalized.strength_wilcox = ggplot(df_median, aes(x = PPI_cat, y = median_normalized_strength, color = PPI_cat, fill = PPI_cat)) +
        geom_violin(alpha = 0.3) +  # Violin plot with transparency
        scale_color_manual(values = PPI_color_palette) +
        scale_fill_manual(values = PPI_color_palette) +
        theme_minimal() +
        theme(legend.position = "none") + #, axis.text.y = element_blank(), axis.title.y = element_blank()) +
        labs(x = "PPI category", y = "median of normalized node strength per PPI") +  # Label the axes
        # Add statistical comparisons using stat_compare_means
        stat_compare_means(
        aes(group = PPI_cat),  # Grouping by the 'PPI_cat' column
        comparisons = list(c("HiG", "CTS"), c("HiG", "HiGCTS"), c("HiGCTS", "CTS")),  # Specify comparisons
        method = "wilcox.test",  # Non-parametric test (Wilcoxon)
        label = "p.signif",  # Show significance labels (e.g., **, *, ns)
        label.x = 1.5,  # Adjust x-position of the p-value text
        size = 4  # Adjust size of the p-value text
        ,tip.length =0
        ) + 
ggtitle('wilcox, median nor_strength')
vertex(violin_median_normalized.strength_wilcox)
}
Bottom
###################################################
# Fig B) boxplot of normalized strength for all clusters per category
# original code: 11.3_CTS_cardiac_network_ANND_pagerank.R
# original pdf: normalized.node.strength_GSE87038_v2.pdf
################################################################
{
CHD = readRDS(file = paste0(inputdir, "CHD_Cilia_Genelist.rds"))

df = readRDS(file = "df_PAGERANK_strength_ANND.rewring.P.rds")

# Keep only desired PPI categories in the correct order
df = rbind(
    subset(df, PPI_cat == "CTS"),
    subset(df, PPI_cat == "HiGCTS"),
    subset(df, PPI_cat == "HiG")
)

# Enforce the factor order
df$PPI_cat = factor(df$PPI_cat, levels = c("CTS", "HiGCTS", "HiG"))
df$signature = factor(df$signature, levels = unique(df$signature))

# Add CHD gene annotations
df$PCGC_AllCurated = toupper(df$gene) %in% toupper(unlist(CHD["Griffin2023_PCGC_AllCurated"]))

# Identify top 5 genes per signature by normalized strength
top_genes = df %>%
    group_by(signature) %>%
    arrange(desc(normalized.strength)) %>%
    slice_head(n = 5) %>%
    ungroup()

# Subset top CHD genes
top_genes_CHD = subset(top_genes, PCGC_AllCurated == TRUE)
(dim(top_genes_CHD))  # Optional: check how many CHD genes were top 5

# Optional: write out table of top 5 genes
tb = top_genes[, c("signature", "gene", "PPI_cat", "normalized.strength", "PCGC_AllCurated")]
write.table(tb, file = "table_top5_strength_perPPI.tsv", sep = "\t", row.names = FALSE)

# Plot
boxplot_strength = ggplot(df, aes(x = signature, y = log10(normalized.strength), colour = PPI_cat)) +
    geom_boxplot(show.legend = TRUE, position = position_dodge2(preserve = "single")) +
    scale_color_manual(values = PPI_color_palette) +
    geom_text_repel(
        data = top_genes_CHD,
        aes(label = gene),
        size = 2,
        box.padding = 0.5,
        point.padding = 0.5,
        segment.color = "grey50",
        max.overlaps = 20,
        show.legend = FALSE
    ) +
    theme(
        legend.position = "none",
        legend.justification = c(1, 1),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)
    ) +
    scale_x_discrete(limits = unique(df$signature)) +
    labs(color = "PPI cat")

vertex(boxplot_strength)
}

Panel C — Connectivity and PageRank Influence of Transition-Cluster PPINs

Purpose

Node-strength: To determine whether PPINs derived from transcriptionally unstable critical transition signatures (CTSs) (clusters 7, 8, 11, 13, 15, 16, 16.1) show systematically altered connectivity. PageRank: To quantify how global influence is distributed within each CTS PPIN. PageRank identifies genes positioned on influential, central information routes through the network.

What this Panel Shows

Interpretation

CTS&HiG networks exhibit higher normalized node strength and higher median PageRank than HiG, particularly when limited to just the six transition clusters. Because strength is normalized for network size and PageRank reflects relative centrality, these results indicate that CTS&HiG nodes remain densely connected and centrally positioned.

Inputs

Outputs

Panel C

Note: figures shown are polished versions of the raw outputs

Code

Left
###################################################
# Fig C) boxplot of normalized strength for only transition clusters per category
# original code: 11.2.1_CTS_cardiac_network_strengthDistribution.R
# original pdf: boxplot_normalized_strength_GSE87038.pdf
################################################################
{
V_deg_dis = lapply(graph_list, function(x) strength_distribution(x, cumulative=TRUE )) %>% 
            lapply(., function(x) x %>% 
                    as.data.frame(strength_distribution=x)  %>% 
                    mutate(k=seq_along(x))) %>% 
rbindlist(.,idcol=names(.))

colnames(V_deg_dis)[1:2]=c('signature','strength_distribution')
V_deg_dis$PPI_cat = lapply(V_deg_dis$signature, function(x) unlist(strsplit(x , '_'))[1]) %>% unlist %>%
        factor(.,levels=c('CTS', 'HiGCTS', 'HiG')) 

(table(V_deg_dis$PPI_cat))
#    CTS HiGCTS    HiG 
#    578    176   1900
V_deg_dis$cluster = lapply(V_deg_dis$signature, function(x) unlist(strsplit(x , '_'))[2]) %>% unlist 

all(V_deg_dis$signature %in% names(graph_list)) #T
V_deg_dis$n_nodes <- 0
for(i in seq_along(graph_list)){
    j = which(V_deg_dis$signature == names(graph_list)[i])
    V_deg_dis$n_nodes[j] <- vcount(graph_list[[i]])
    }
V_deg_dis$normalized_strength_distribution =  V_deg_dis$strength_distribution / (V_deg_dis$n_nodes-1)
    
boxplot_transition_strength = ggplot(subset(V_deg_dis, grepl(CT_id_formatted, signature)), 
                aes(x = factor(PPI_cat), y = normalized_strength_distribution, fill = PPI_cat)) + 
            geom_boxplot() +
            labs(x = "PPIN Category", y = "Normalized Strength Distribution", title = "PPINs for transition clusters") +
            theme_minimal() +
            scale_fill_manual(values = PPI_color_palette) +
            stat_compare_means(method = "wilcox", 
                        comparisons = list(c("CTS", "HiGCTS"), c("CTS", "HiG"), c("HiGCTS", "HiG")), 
                        p.adjust.method = "BH",  # Adjust p-values using Benjamini-Hochberg (BH) method
                        label = "p.signif")
print(boxplot_transition_strength)
}
Right
###################################################
# Fig D) violin plot of median PageRank per category
# original code: 11.3_CTS_cardiac_network_ANND_pagerank.R
# original pdf: PageRank_IbarraSoria20183_v2.pdf
################################################################
{
df = readRDS(file='df_PAGERANK_strength_ANND.rewring.P.rds')  #!!!!!!!!!!!!!!!!!!!!!!!
df = rbind(subset(df, PPI_cat=='CTS'),
                    subset(df, PPI_cat=='HiGCTS'),
                    subset(df, PPI_cat=='HiG')
                    )
df$label=df$gene

df_median = df %>% group_by(signature) %>%
                    summarise(pg.median = median(PageRank, na.rm = TRUE))
df_median$PPI_cat = lapply(df_median$signature, function(x) unlist(strsplit(x, split='_'))[1]) %>% unlist 
df_median$PPI_cat = factor(df_median$PPI_cat,levels=c('CTS', 'HiGCTS', 'HiG')) 				
        
violin_median_pagerank = ggplot(df_median, aes(x = PPI_cat, y = pg.median, color = PPI_cat, fill = PPI_cat)) +
        geom_violin(alpha = 0.3) +  # Violin plot with transparency
        scale_color_manual(values = PPI_color_palette) +
        scale_fill_manual(values = PPI_color_palette) +
        theme_minimal() +
        theme(legend.position = "none") + #, axis.text.y = element_blank(), axis.title.y = element_blank()) +
        labs(x = "PPI category", y = "median of PageRanks per PPI") +  # Label the axes
        # Add statistical comparisons using stat_compare_means
        stat_compare_means(
        aes(group = PPI_cat),  # Grouping by the 'PPI_cat' column
        comparisons = list(c("HiG", "CTS"), c("HiG", "HiGCTS"), c("HiGCTS", "CTS")),  # Specify comparisons
        method = "wilcox.test",  # Non-parametric test (Wilcoxon)
        label = "p.signif",  # Show significance labels (e.g., **, *, ns)
        label.x = 1.5,  # Adjust x-position of the p-value text
        size = 4  # Adjust size of the p-value text
        ,tip.length =0
        ) +
        ggtitle('wilcox test, median PA')

vertex(violin_median_pagerank)
}

Panel D — Resistance to Targeted and Random Attacks

Purpose

To test PPIN robustness under random failure vs targeted removal of high-betweenness nodes.

What this Panel Shows

Interpretation

When nodes were removed by descending betweenness centrality, CTS and CTS&HiG PPINs fragmented faster than HiG PPINs, whereas random removal had a much weaker effect in the transition PPINS.

Inputs

Outputs

Panel D

Note: figures shown are polished versions of the raw outputs

Code

###################################################
# Fig E) boxplot of %remained_fraction of targeted attack vs random attack
# original code: 11.2_CTS_cardiac_network_robustness.R
# original pdf: box_wilcox-test_attack_GSE87038.pdf
################################################################
{

attack.edge.btwn = readRDS(file='attack.edge.btwn.rds')
attack.vertex.btwn = readRDS(file='attack.vertex.btwn.rds')
failure.vertex = readRDS(paste0('failure.vertex_100_simplified_', s, 'weighted.rds'))
failure.edge = readRDS(paste0('failure.edge_100_simplified_', s, 'weighted.rds'))
failure.dt <- rbind(failure.edge, failure.vertex)   

colnames(failure.dt)[1] ='signature'
colnames(attack.vertex.btwn)[1] = 'signature'

robustness.dt <- rbind(failure.dt, attack.vertex.btwn, attack.edge.btwn)
robustness.dt$PPI_cat = lapply(robustness.dt$signature, function(x) unlist(strsplit(x, '_'))[1]) %>%
    unlist() %>%
    factor(levels = c('CTS', 'HiGCTS', 'HiG'))

robustness.dt$experiment = ifelse(grepl('edge', robustness.dt$type), 'edge', 'vertex')
robustness.dt$measure = factor(robustness.dt$measure, levels = c("random", "btwn.cent"))
robustness.dt$type = factor(robustness.dt$type,
                            levels = c("Random edge removal", "Targeted edge attack",
                                        "Random vertex removal", "Targeted vertex attack"))
robustness.dt$cluster = lapply(robustness.dt$signature, function(x) unlist(strsplit(x, '_'))[2]) %>% unlist()

# Filter by plot_mode
if (plot_mode == "vertex") {
    robustness.sub <- subset(robustness.dt, experiment == "vertex")
} else if (plot_mode == "edge") {
    robustness.sub <- subset(robustness.dt, experiment == "edge")
} else {
    robustness.sub <- robustness.dt  # "both"
}

# Dynamically select comparisons based on what’s in the filtered data
comparisons_list <- list()
if ("Random vertex removal" %in% robustness.sub$type &&
    "Targeted vertex attack" %in% robustness.sub$type) {
    comparisons_list <- append(comparisons_list, list(c("Random vertex removal", "Targeted vertex attack")))
}
if ("Random edge removal" %in% robustness.sub$type &&
    "Targeted edge attack" %in% robustness.sub$type) {
    comparisons_list <- append(comparisons_list, list(c("Random edge removal", "Targeted edge attack")))
}

# Make sure PPI_cat is a factor with correct levels
robustness.sub$PPI_cat <- factor(robustness.sub$PPI_cat, levels = c("CTS", "HiGCTS", "HiG"))

# Define comparisons explicitly
ppi_comparisons <- list(
    c("HiG", "HiGCTS"),
    c("HiG", "CTS"),
    c("CTS", "HiGCTS")
)

# Helper to convert p-value to significance stars
p_to_stars <- function(p) {
    if (is.na(p)) return(NA)
    if (p <= 1e-4) "****"
    else if (p <= 1e-3) "***"
    else if (p <= 1e-2) "**"
    else if (p <= 0.05) "*"
    else "ns"
}

# Boxplot
boxplot_remained_fraction = ggplot(
    robustness.sub,
    aes(
    x = type,
    y = comp.pct,
    fill = measure,
    color = PPI_cat,
    size = experiment
    )
) +
    geom_boxplot(alpha = 0.5, position = position_dodge(width = 0.8)) +
    facet_wrap(~ PPI_cat, ncol = 3) +
    scale_color_manual(values = PPI_color_palette, guide = "none") +
    scale_fill_manual(values = c("random" = "grey", "btwn.cent" = "white")) +
    scale_size_manual(values = c("edge" = 1.2, "vertex" = 0.4), name = "Experiment") +
    geom_signif(
    comparisons = comparisons_list,
    map_signif_level = TRUE,
    step_increase = 0.1,
    aes(group = type),
    test = "wilcox.test"
    ) +
    theme_minimal() +
    labs(
    title = paste0("Comparison of Robustness Measures (", plot_mode, " only)"),
    x = "PPI Category",
    y = "Component Percentage Remaining"
    ) +
    theme(
    legend.position = "top",
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    strip.text = element_text(face = "bold")
    )

print(boxplot_remained_fraction)

## finally, manually add the threshold of fold changes  ###############
robustness.dt <- robustness.dt %>%
    group_by(PPI_cat, type) %>%
    mutate(
    mean_comp_pct = mean(comp.pct, na.rm = TRUE)  # Calculate mean comp.pct for each group
    ) %>%
    ungroup() 

# Calculate fold change for each PPI_cat
fold_change <- robustness.dt %>%
    group_by(PPI_cat) %>%
    summarise(
    fold_change_edge = mean(comp.pct[type == "Random edge removal"], na.rm = TRUE) /
                        mean(comp.pct[type == "Targeted edge attack"], na.rm = TRUE),
    fold_change_vertex = mean(comp.pct[type == "Random vertex removal"], na.rm = TRUE) /
                            mean(comp.pct[type == "Targeted vertex attack"], na.rm = TRUE)
    )
(fold_change)
#   PPI_cat fold_change_edge fold_change_vertex
#                               
# 1 CTS                1.04                1.66
# 2 HiGCTS             1.00                1.68
# 3 HiG                0.913               1.09

# HiG vertex too low, nonsignificant

### additionally, wilcox-test the between-group changes among observed PPINs
tmp = subset(robustness.dt,measure=='btwn.cent')
dim(tmp) #[1] 8245    9
wilcox.test(subset(tmp,PPI_cat=='HiG')$comp.pct, subset(tmp,PPI_cat=='CTS')$comp.pct)
# W = 1961908, p-value < 2.2e-16

wilcox.test(subset(tmp,PPI_cat=='HiG')$comp.pct, subset(tmp,PPI_cat=='HiGCTS')$comp.pct)
# W = 623754, p-value = 1.008e-09

wilcox.test(subset(tmp,PPI_cat=='HiGCTS')$comp.pct, subset(tmp,PPI_cat=='CTS')$comp.pct)
# W = 25486, p-value = 0.1847

# Run Wilcoxon tests between PPI categories
sig_results_ppi <- lapply(ppi_comparisons, function(cmp) {
    g1 <- cmp[1]; g2 <- cmp[2]
    sub1 <- robustness.sub %>% filter(PPI_cat == g1)
    sub2 <- robustness.sub %>% filter(PPI_cat == g2)
    
    # Check there’s something to compare
    if (nrow(sub1) > 1 & nrow(sub2) > 1) {
    wt <- wilcox.test(sub1$comp.pct, sub2$comp.pct, exact = FALSE)
    data.frame(
        group1 = g1,
        group2 = g2,
        p = wt$p.value,
        p_stars = p_to_stars(wt$p.value)
    )
    } else {
    data.frame(group1 = g1, group2 = g2, p = NA, p_stars = NA)
    }
}) %>% bind_rows()

(sig_results_ppi)
#   group1 group2            p p_stars
# 1    HiG HiGCTS 5.746002e-11    ****
# 2    HiG    CTS 8.242192e-28    ****
# 3    CTS HiGCTS 4.910623e-01      ns

}

Panel E — Fragmentation Trajectories and Robustness AUC

Purpose

To quantify PPIN robustness under targeted vertex removal by observing how networks fragment as key nodes are deleted. Robustness is quantified by tracking network fragmentation (e.g., loss of the largest connected component) under strategic versus random removal, enabling comparison of PPIN vulnerability across categories.

What this Panel Shows

Interpretation

Low AUC → fragile networks that collapse early under targeted attacks.
High AUC → robust networks that maintain connectivity longer.

Transition PPINs show significantly lower AUC, indicating heightened vulnerability to the removal of influential nodes and fewer redundant paths relative to HiG networks.

Inputs

Outputs

Panel E

Note: figures shown are polished versions of the raw outputs

Left
###################################################
# Fig F) ARC plot of vertex attack
# original code: 11.2_CTS_cardiac_network_robustness.R
# original pdf: attack_GSE87038.pdf
################################################################
{  
failure.dt = readRDS(file=paste0('failure.vertex_100_simplified_',s,'weighted.rds'))  
colnames(failure.dt)[1] ='signature'

# attack.edge.btwn = readRDS( file=paste0('attack.edge.btwn.rds'))
attack.vertex.btwn = readRDS( file=paste0('attack.vertex.btwn.rds'))
colnames(attack.vertex.btwn)[1] = 'signature'	 

robustness.dt <- rbind(failure.dt, attack.vertex.btwn[,1:6])  #, attack.edge.btwn[,1:6])  
(dim(robustness.dt)) #  16490     6
robustness.dt$PPI_cat = lapply(robustness.dt$signature, function(x) unlist(strsplit(x , '_'))[1]) %>% unlist %>%
        factor(.,levels=c('CTS', 'HiGCTS', 'HiG')) 
head(robustness.dt, 3)


robustness.dt$measure %>% unique
#[1] "random"     "btwn.cent"
robustness.dt = subset(robustness.dt ,measure != 'degree')
robustness.dt$experiment = ifelse(grepl('edge', robustness.dt$type), 'edge', 'vertex')
robustness.dt$measure= factor(robustness.dt$measure, levels = c("random" ,  "btwn.cent"))

robustness.dt$type = factor(robustness.dt$type,
        levels = c("Random edge removal" , "Targeted edge attack"  ,  "Random vertex removal" , "Targeted vertex attack") )
robustness.dt$cluster = lapply(robustness.dt$signature, function(x) unlist(strsplit(x , '_'))[2]) %>% unlist                                
    
p_attack4 <- ggplot(robustness.dt,
                    aes(x=removed.pct, y=comp.pct, col=PPI_cat, linetype=PPI_cat)) +  
                geom_line(aes(group=signature,  size = PPI_cat, shape=PPI_cat), show.legend = FALSE) +  # colored by PPI_cat
                scale_color_manual(values = PPI_color_palette) +
                scale_size_manual(values = PPI_size_palette) +  # Set line width
                #scale_shape_manual(values = c("HiG" = 16, "CTS" = 17, "HiGCTS" = 18)) +  
                facet_wrap(~ type) + 
                geom_abline(slope=-1, intercept=1, col='gray', lty=2) +
                    theme(legend.position=c(0, 0), legend.justification=c(0, 0)) +
                    labs(x='% edges/vertex removed', y='% of max. component remaining')
vertex(p_attack4)
}
Right
###################################################
# Fig H) boxplot of AUC for vertex attack
# original code: 11.2_CTS_cardiac_network_robustness.R
# original pdf: box_wilcox-test_attack_AUC_GSE87038.pdf
################################################################
{
failure.dt = readRDS(file=paste0('failure.vertex_100_simplified_', s, 'weighted.rds'))
colnames(failure.dt)[1] = 'signature'

attack.vertex.btwn = readRDS('attack.vertex.btwn.rds')
colnames(attack.vertex.btwn)[1] = 'signature'

robustness.dt <- rbind(failure.dt, attack.vertex.btwn[,1:6])
robustness.dt$PPI_cat = lapply(robustness.dt$signature, function(x) unlist(strsplit(x , '_'))[1]) %>% unlist %>%
    factor(.,levels=c('CTS', 'HiGCTS', 'HiG'))

robustness.dt = subset(robustness.dt ,measure != 'degree')
robustness.dt$type = factor(robustness.dt$type,
    levels = c("Random edge removal","Targeted edge attack","Random vertex removal","Targeted vertex attack"))

observed_auc_list = list()
for(j in names(graph_list)){
	observed_auc_list[[j]] = Area_Under_Curve(subset(robustness.dt, signature==j & type=='Targeted vertex attack')$removed.pct, 
				 subset(robustness.dt, signature==j & type=='Targeted vertex attack')$comp.pct ) 
} 
df_AUC = data.frame(auc=observed_auc_list %>% unlist, 
			signature = names(observed_auc_list), 
			PPI_cat = lapply(names(observed_auc_list), function(x) unlist(strsplit(x, split='_'))[1]) %>% unlist)
df_AUC$PPI_cat  = factor(df_AUC$PPI_cat , levels = c('CTS',"HiGCTS" ,"HiG"))

boxplot_AUC_vertex_attack = ggplot(df_AUC, aes(x = PPI_cat, y = auc, fill = PPI_cat)) + 
  geom_boxplot(alpha = 0.5, position = position_dodge(width = 0.75)) +  # Dodge the boxes for each type per PPI_cat
  #facet_wrap(experiment ~ PPI_cat, ncol = 3) +  # Facet by PPI_cat, each PPI_cat gets a row
  scale_fill_manual(values = PPI_color_palette) +
  #scale_fill_manual(values = c("random" = "grey", "btwn.cent" = "orange")) +
  geom_signif(
    comparisons = list(
      c("CTS", "HiGCTS"), c("HiGCTS", "HiG"),c("CTS", "HiG")
    ),
    map_signif_level = TRUE,
    step_increase = 0.1,  # Adjusts spacing between the lines
    #aes(group = type),
    test = "wilcox.test"  # Perform a t-test to calculate significance
	#, p.adjust.method = "holm"   # default is 
  ) + 
  theme_minimal() +
  labs(
    title = "Comparison of Robustness Measures by cent.btw",
    x = "PPI Category", 
    y = "AUC of targeted vertex attack"
  ) +
  theme(legend.position = "top")
vertex(boxplot_AUC_vertex_attack)
}

Panel F — Simulation Network Fragility

Purpose

To compare empirical PPIN fragility against synthetic network models generated from standard topological frameworks. Monte-Carlo simulations test whether the vulnerability of real CTS/HiG networks arises from their degree structure or reflects deeper organizational constraints.

What this Panel Shows

Interpretation

The simulation graph shows that degree-preserving rewired CTS&HiG networks lose global connectivity earlier than the real PPIN, indicating that resilience is not explained by the degree sequence alone but depends on biologically structured edge placement. At the same time, the real network fragments earlier than toy generative graphs, placing CTS&HiG PPINs in an intermediate regime that balances robustness to random disruption with selective vulnerability to targeted bridge removal.

Inputs

Outputs

Panel F

Note: figures shown are polished versions of the raw outputs

###################################################
# Fig K) Monte-Carlo robustness simulation for CP_CTS
# original code: 11.7_synthetic_simulation.R
# original pdf: simulation.pdf
###################################################
{
# Select real network
g_real <- graph_list[[CP_CTS]]

# Run simulation
res <- synthetic_simulation(g_real, main = CP_CTS)

# Combine into a single plot object
plot_simulation <- grid.arrange(
    res$p_weights,
    res$p_line,
    res$p_AUC,
    ncol = 3
)

print(plot_simulation)
}

Panel G — Distribution of PPI Edge Weights

Purpose

To compare how strongly connected the protein–protein interactions (PPINs) are within each PPIN category (CTS, CTS&HiG, HiG). Edge weight is a combination of string PPI confidence and coexpression in that cluster.

What this Panel Shows

Interpretation

Modules are tightly connected groups of nodes; since all network categories have similar numbers of modules, their different fragilities must arise from how modules are wired together, not from how many modules exist.

Inputs

Outputs

Panel G

Note: figures shown are polished versions of the raw outputs

###################################################
# Fig L) Edge weight density
# original code: 11.6_communities_edge_weight_distribution.R
# original pdf: edge_weight.pdf
###################################################
{
graph_list_notsimplified <- readRDS( file= paste0('../', db, '_STRING_graph_perState_notsimplified.rds'))

# Remove duplicate vertices
correct_n_edges = readRDS('../correct_n_edges_HiG_STRING2.14.0.rds')
for(g_name in unique(correct_n_edges$graph_id)){
    vertices_to_remove = subset(correct_n_edges, graph_id == g_name)$vetex_index_to_remove
    if(any(is.na(vertices_to_remove))) vertices_to_remove = vertices_to_remove[!is.na(vertices_to_remove)]
    graph_list[[g_name]] = delete_vertices(graph_list_notsimplified[[g_name]], vertices_to_remove)
}
N = sapply(graph_list_notsimplified, vcount)
((N0-N)[which(N0-N>0)])
# named numeric(0)

graph_list_notsimplified <- lapply(graph_list_notsimplified, simplify, edge.attr.comb ='max') #!!!!!!!!!!!!!!!!!!!
N2 = sapply(graph_list_notsimplified, vcount)
(all(N==N2))   # [1] TRUE 

edge_data <- extract_edge_weights_by_category(graph_list, PPI_color_palette, CT_id)
(head(edge_data, 3))
#   sample PPI_cat edge_weight num_edges cluster_ID cluster_cat
# 1  HiG_1     HiG       0.211      4978          1      stable
# 2  HiG_1     HiG       0.410      4978          1      stable
# 3  HiG_1     HiG       0.767      4978          1      stable

# Create plots for PPI category analysis
category_plots <- plot_edge_weight_distributions(edge_data, PPI_color_palette)

plot_density <- category_plots$density_plot

edge_counts <- edge_data %>%
    group_by(PPI_cat) %>%
    summarise(
    edge_count = n(),
    .groups = "drop"
    )

(edge_counts)
#   PPI_cat edge_count
#           
# 1 CTS            905
# 2 HiGCTS         137
# 3 HiG         184731

pairwise_pvals <- edge_data %>%
    pairwise_wilcox_test(
    formula = edge_weight ~ PPI_cat,
    p.adjust.method = "BH"
    )

(pairwise_pvals)
#   .y.         group1 group2    n1     n2 statistic        p   p.adj p.adj.signif
# *                                  
# 1 edge_weight CTS    HiGCTS   905    137     45708  7.03e-7 2.11e-6 ****        
# 2 edge_weight CTS    HiG      905 184731  80746929  7.7 e-2 7.7 e-2 ns          
# 3 edge_weight HiGCTS HiG      137 184731  14062139  2.4 e-2 3.6 e-2 *   
}

Panel H — Median Betweenness Centrality

Purpose

To evaluate whether PPIN categories differ in bottleneck load, using betweenness centrality to identify nodes that lie on many shortest paths.

What this Panel Shows

Interpretation

At the category level, median betweenness centrality is significantly lower in CTS and CTS&HiG PPINs compared with HiG PPINs. This indicates that HiG networks are more dominated by a small number of high-betweenness hub nodes, whereas transition PPINs (CTS and CTS&HiG) exhibit a more distributed topology with fewer globally dominant bottlenecks. In other words, information flow in HiG PPINs is concentrated through central hubs, while transition PPINs spread connectivity more evenly across nodes.

Inputs

Outputs

Panel H

Note: figures shown are polished versions of the raw outputs

###################################################
# Fig G) violin plot of median betweenness per category
# original code: Code: 11.3_CTS_cardiac_network_ANND_pagerank.R
# original pdf: BetweennessCentrality_GSE870383_v2.pdf
################################################################
{
df_BC = read.table(file='df_betweeness.tsv',sep='\t', header=T) 
df_BC$PPI_cat = factor(df_BC$PPI_cat,levels=c('CTS', 'HiGCTS', 'HiG')) 			
df5 <- df_BC %>%
        filter(rank_by_BC <= 5 & BetweennessCentrality>0) %>%
        ungroup()

df_median = df_BC %>% group_by(signature) %>%
                    summarise(bc.median = median(BetweennessCentrality, na.rm = TRUE)) %>%
                    as.data.frame()
df_median$PPI_cat = lapply(df_median$signature %>% as.vector, function(x) unlist(strsplit(x, split='_'))[1]) %>% unlist 
df_median$PPI_cat = factor(df_median$PPI_cat,levels=c('CTS', 'HiGCTS', 'HiG')) 				

violin_median_bc_wilcox = ggplot(df_median, aes(x = PPI_cat, y = bc.median , color = PPI_cat, fill = PPI_cat)) +
        geom_violin(alpha = 0.3, drop = FALSE) +  # Violin plot with transparency
        scale_color_manual(values = PPI_color_palette) +
        scale_fill_manual(values = PPI_color_palette) +
        theme_minimal() +
        theme(legend.position = "none") + #, axis.text.y = element_blank(), axis.title.y = element_blank()) +
        labs(x = "PPI category", y = "median of BC per PPI") +  # Label the axes
        # Add statistical comparisons using stat_compare_means
        stat_compare_means(
        aes(group = PPI_cat),  # Grouping by the 'PPI_cat' column
        comparisons = list(c("HiG", "CTS"), c("HiG", "HiGCTS"), c("HiGCTS", "CTS")),  # Specify comparisons
        method = "wilcox.test",  # Non-parametric test (Wilcoxon)
        label = "p.signif",  # Show significance labels (e.g., **, *, ns)
        label.x = 1.5,  # Adjust x-position of the p-value text
        size = 4  # Adjust size of the p-value text
        ,tip.length =0
        ) +
        ylim(0, NA)  +  # Start from 0, let ggplot choose upper limit
        ggtitle('wilcox-test, median BC')
    
plot(violin_median_bc_wilcox)
}

Panel I — Boxplot of Betweenness per PPIN

Purpose

To visualize the full distribution of betweenness centrality across all genes in each PPIN, complementing Panel H.

What this Panel Shows

Interpretation

Although transition PPIN categories have lower median betweenness overall (as seen in H), individual CTS and CTS&HiG PPINs display a pronounced upper tail of nodes with high betweenness centrality. These nodes act as structural bridges linking otherwise weakly connected modules within the network. The presence of such high-BC outliers indicates “few-node sensitivity”: while the network is broadly distributed, it remains vulnerable to perturbation of a small set of key bridge nodes. This motivates gene-level prioritization within transition PPINs, as these nodes disproportionately maintain inter-module communication.

Inputs

Outputs

Panel I

Note: figures shown are polished versions of the raw outputs

###################################################
# Fig I) boxplot of betweenness per PPIN
# original code: Code: 11.3_CTS_cardiac_network_ANND_pagerank.R
# original pdf: BetweennessCentrality_IbarraSoria2018_v2.pdf
################################################################
{
df_BC <- read.table(file = "df_betweeness.tsv", sep = "\t", header = T)
df_BC$PPI_cat <- factor(df_BC$PPI_cat, levels = c("CTS", "HiGCTS", "HiG"))

CHD <- readRDS(file = paste0(inputdir, "CHD_Cilia_Genelist.rds"))
df_BC$PCGC_AllCurated <- toupper(df_BC$gene) %in% toupper(unlist(CHD["Griffin2023_PCGC_AllCurated"]))

# Calculate top 5 significant genes within each box
df5 <- df_BC %>%
    filter(rank_by_BC <= 5 & BetweennessCentrality > 0) %>%
    ungroup()

df5_CHD <- subset(df5, PCGC_AllCurated == TRUE)

df_BC <- rbind(
    subset(df_BC, PPI_cat == "CTS"),
    subset(df_BC, PPI_cat == "HiGCTS"),
    subset(df_BC, PPI_cat == "HiG")
)
df_BC$signature <- factor(df_BC$signature, levels = unique(df_BC$signature))

boxplot_bc_log10 <- ggplot(df_BC, aes(x = signature, y = log10(BetweennessCentrality + 1), colour = PPI_cat)) +
    geom_boxplot(position = "dodge2") +
    theme(
        legend.position = "none", # c(1,1)
        legend.justification = c(1, 1), # Place legend at top-right corner
        axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)
    ) +
    scale_color_manual(values = PPI_color_palette) +
    geom_text_repel(
        data = df5_CHD, # df5
        aes(label = gene),
        size = 2, # Adjust the size of the text labels
        box.padding = 0.5, # Add space between the text and the data points
        point.padding = 0.5, # Add space between the text and the points
        segment.color = "grey50", # Color for the line connecting the text to the points
        max.overlaps = 40, # Max number of overlaps before labels stop being placed
        show.legend = FALSE # Do not show text labels in the legend
    ) +
    # scale_x_discrete(limits = unique(df$signature)) +
    labs(color = "PPI cat") # Optional: label for the color legend

vertex(boxplot_bc_log10)
}

Panel J — Boxplot of PageRank per PPIN

Purpose

To show the full distribution of PageRank values across all genes in each PPIN, complementing the median-based view from Panel C.

What this Panel Shows

Interpretation

PageRank analysis shows that, in CTS and CTS&HiG PPINs, only a small number of genes stand out as especially influential within the network, even though these networks do not rely on a single dominant hub overall. These influential genes sit at key positions in transition networks, where changes to them could have broad effects on how signals propagate across the system. By contrast, HiG PPINs display consistently low PageRank values across genes, suggesting a network structure driven more by tightly connected local groups than by a few genes with system-wide influence.

Inputs

Outputs

Panel J

Note: figures shown are polished versions of the raw outputs

###################################################
# Fig J) boxplot of PageRank per PPIN
# original code: Code: 11.3_CTS_cardiac_network_ANND_pagerank.R
# original pdf: PageRank_GSE870383_v2.pdf
################################################################
{
CHD = readRDS( file=paste0(inputdir, 'CHD_Cilia_Genelist.rds'))

df = readRDS(file='df_PAGERANK_strength_ANND.rewring.P.rds')  #!!!!!!!!!!!!!!!!!!!!!!!

df = rbind(subset(df, PPI_cat=='CTS'),
                    subset(df, PPI_cat=='HiGCTS'),
                    subset(df, PPI_cat=='HiG')
                    )
## ensure the same order along x-axis					
# df$signature <- factor(df$signature, levels = signature_levels)
df$label=df$gene
df$PCGC_AllCurated = toupper(df$gene) %in% toupper(unlist(CHD['Griffin2023_PCGC_AllCurated']))	
# Calculate top 5 significant genes within each box
df5 <- df %>%
        filter(rank_by_PR <= 5) %>%
        ungroup()	
tb = df5[, c('signature','gene','PPI_cat','rank_by_PR','PCGC_AllCurated')]
write.table(tb, file= 'table_top5_pagerank_perPPI.tsv', sep='\t', row.names=F)
        
df5_CHD = subset(df5, PCGC_AllCurated==TRUE)
(dim(df5_CHD))  # [1] 15 18

boxplot_pagerank <- ggplot(df, aes(x = signature, y = PageRank, colour = PPI_cat)) +
    geom_boxplot(show.legend = TRUE) + # Enable legend for the boxplot
    scale_color_manual(values = PPI_color_palette) +
    geom_text(
        data = df5_CHD, aes(label = gene), # data=df5
        size = 2, # Adjust the size of the text labels
        hjust = -0.1, vjust = 0,
        check_overlap = TRUE
    ) + # Avoid text overlap
    theme(
        legend.position = "none", # c(1, 1),
        legend.justification = c(1, 1), # Place legend at top-right corner
        axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)
    ) +
    scale_x_discrete(limits = unique(df$signature)) +
    labs(color = "PPI cat") # Optional: label for the color legend

vertex(boxplot_pagerank)
}

Appendix — Derived Data Used in Panels

Panel A & B & C(left) — Strength Architecture and Distribution

Inputs

Provenance

Panel C (right) — Network Influence via PageRank

Inputs

Provenance

Panel D & E — Network Robustness (Targeted & Random Attacks)

Inputs

Provenance

Panel F — Simulation Network Fragility

Inputs

Provenance

Panel G — Distribution of PPI Edge Weights

Inputs

Provenance

Panel H & I — Betweenness Centrality

Inputs

Provenance

Panel J — PageRank Distribution

Inputs

Provenance

Appendix — Key File Lineage Prior to 11.2.0

11.1 — STRING-Weighted CTS Cardiac Network

Inputs

Outputs

11.1.1 — Check Vertex Duplication

Inputs

Outputs

11.2.0 — Update Network Weights (clean_max)

Inputs

Outputs

 
 

R Session Info

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] qlcMatrix_0.9.8             sparsesvd_0.2-2            
 [3] slam_0.1-55                 Matrix_1.7-3               
 [5] scater_1.37.0               scuttle_1.19.0             
 [7] SingleCellExperiment_1.31.1 SummarizedExperiment_1.39.1
 [9] Biobase_2.69.0              GenomicRanges_1.61.1       
[11] Seqinfo_0.99.1              IRanges_2.43.0             
[13] S4Vectors_0.47.0            BiocGenerics_0.55.0        
[15] generics_0.1.4              MatrixGenerics_1.21.0      
[17] matrixStats_1.5.0           sm_2.2-6.0                 
[19] MLmetrics_1.1.3             pracma_2.4.6               
[21] scales_1.4.0                rstatix_0.7.2              
[23] igraph_2.1.4                ggrepel_0.9.6              
[25] ggpubr_0.6.1                ggplot2_3.5.2              
[27] data.table_1.17.6           dplyr_1.1.4                
[29] gplots_3.2.0               

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1    viridisLite_0.4.2   vipor_0.4.7        
 [4] farver_2.1.2        viridis_0.6.5       bitops_1.0-9       
 [7] rsvd_1.0.5          lifecycle_1.0.4     magrittr_2.0.3     
[10] compiler_4.5.1      rlang_1.1.6         tools_4.5.1        
[13] ggsignif_0.6.4      S4Arrays_1.9.1      labeling_0.4.3     
[16] DelayedArray_0.35.2 RColorBrewer_1.1-3  abind_1.4-8        
[19] BiocParallel_1.43.4 KernSmooth_2.23-26  withr_3.0.2        
[22] purrr_1.0.4         grid_4.5.1          caTools_1.18.3     
[25] beachmat_2.25.1     gtools_3.9.5        docopt_0.7.2       
[28] cli_3.6.5           crayon_1.5.3        ggbeeswarm_0.7.2   
[31] XVector_0.49.0      vctrs_0.6.5         jsonlite_2.0.0     
[34] carData_3.0-5       BiocSingular_1.25.0 car_3.1-3          
[37] BiocNeighbors_2.3.1 irlba_2.3.5.1       Formula_1.2-5      
[40] beeswarm_0.4.0      tidyr_1.3.1         glue_1.8.0         
[43] codetools_0.2-20    gtable_0.3.6        ScaledMatrix_1.17.0
[46] tibble_3.3.0        pillar_1.11.0       R6_2.6.1           
[49] lattice_0.22-7      backports_1.5.0     broom_1.0.8        
[52] Rcpp_1.1.0          gridExtra_2.3       SparseArray_1.9.0  
[55] pkgconfig_2.0.3