Example of analysis pipeline - Unweaving the polygenic web

Giulia Piazza

This is an example of the pipeline followed for the analysis in "Unweaving the web: Polygenic influences on networks of psychopathology symptoms". The example is focused only on the network with the polygenic score for ADHD for demonstration purposes. All parts of the analysis were run via a computer cluster due to the computing power required. Although this example is not directly reproducible (as it requires raw data), the resulting objects can be found in pipeline_example.RData in this Github repo.

The preregistration of the confirmatory analysis of this project can be found here in the respective OSF project.

Load packages and data

Loading imputed and cleaned data from ALSPAC and TEDS (see paper for more details). Polygenic scores were all calculated with this script, which follows this LDpred2 tutorial (accessed in September 2022). The ADHD polygenic score was derived from effect sizes in Demontis et al. (2019).

library(qgraph)
library(bootnet)
library(dplyr)
library(psychonetrics)
library(readxl)

Estimate network in ALSPAC

Estimating the ADHD polygenic score network in ALSPAC with unregularised model search with the package 'bootnet'.

selection = c("DEP.2","DEP.3", "DEP.5","DEP.6", "DEP.7",
              "DEP.8","DEP.9","DEP.10","DEP.11","DEP.12","DEP.13","PRO.1", 
              "HYP.1","EMO.1","PRO.2","COND.1","PEER.1","COND.2","EMO.2", 
              "PRO.3","HYP.2","PEER.2","COND.3","EMO.3","PEER.3","HYP.3", 
              "EMO.4","PRO.4","COND.4","PEER.4","PRO.5","HYP.4","COND.5",
              "PEER.5","EMO.5","HYP.5") # select relevant variables

adhd.network <- ALSPAC %>%
  dplyr::select(selection, ADHD.ld) %>%
  estimateNetwork(., default = "ggmModSelect", stepwise = TRUE) # estimate network with 'ggmModSelect', selected nodes and ADHD PRS calculated with LDPred2

Plot network

Plotting the resulting network with the package 'qgraph' for a quick preliminary visualisation.

qgraph(adhd.network$graph, layout = "spring", theme = "colorblind")

Non-parametric bootstraps

Calculating non-parametric bootstraps (this is time intensive) with 'bootnet'.

adhd.boot <- bootnet(adhd.network,  nBoots = 1000, default = "ggmModSelect", type = "nonparametric", nCores = 20)

Confirmatory Network Analysis

Model 1

Model 1 tests the fit of the network estimated in ALSPAC in TEDS. Here, I take the adjacency matrix of the ADHD polygenic score network calculated above (a matrix with ones/zeros for presence/absence of edges) and test its fit in TEDS data.

# extract structure
adhd.structure = 1*(adhd.network$graph != 0)

# test how well structure fits to teds data
adhd.model = TEDS %>% dplyr::select(DEP.2:HYP.5, ADHD) %>% 
  ggm(., estimator = "FIML", omega = adhd.structure) %>% runmodel()

adhd.model %>% fit() # inspect model fit

##            Measure      Value
##               logl -222605.98
##  unrestricted.logl -221910.32
##      baseline.logl -242816.22
##               nvar         37
##               nobs        740
##               npar        280
##                 df        460
##          objective      28.26
##              chisq    1391.31
##             pvalue        ~ 0
##     baseline.chisq   41811.79
##        baseline.df        666
##    baseline.pvalue        ~ 0
##                nfi       0.97
##               pnfi       0.67
##                tli       0.97
##               nnfi       0.97
##                rfi       0.95
##                ifi       0.98
##                rni       0.98
##                cfi       0.98
##              rmsea      0.021
##     rmsea.ci.lower      0.020
##     rmsea.ci.upper      0.022
##       rmsea.pvalue          1
##             aic.ll  445771.96
##            aic.ll2  445808.18
##              aic.x     471.31
##             aic.x2    1951.31
##                bic  447574.94
##               bic2  446685.21
##            ebic.25  448586.00
##             ebic.5  449597.05
##            ebic.75  450405.90
##              ebic1  451619.17

Model 2

Model 2 involves multi-group testing and assesses whether edges have comparable weights in ALSPAC and TEDS. Here, I create a combined data frame with ALSPAC and TEDS data, with a column indicating the cohort. I then use the package 'psychonetrics' to run a model where edge weights in ALSPAC and TEDS are set to be equal, and compare it to a model where they are set free to vary.

alspac.data = ALSPAC %>% dplyr::select(selection, BMI.ld:ANX.P.ld, ADHD.ld) # create dataframe with relevant vars for alspac
colnames(alspac.data) = c(selection,  "BMI",  "EA",   "DEP", "ANX",  "ADHD")

alspac.data$group = "alspac"  

teds.data = TEDS %>% dplyr::select(selection,  "BMI",  "EA",   "DEP", "ANX",  "ADHD") # select relevant vars for teds
teds.data$group = "teds"

comb.data = rbind(alspac.data, teds.data) # create dataset of combined teds and alspac

comb.model.adhd <- comb.data %>% dplyr::select(DEP.2:HYP.5, ADHD, group) %>% 
  ggm(., estimator = "FIML", omega = adhd.structure, groups = "group") %>% 
  groupequal("omega") %>% runmodel # run a model where edges are equal in alspac and teds

comb.model.adhd.fit = comb.model.adhd %>% fit()

##            Measure      Value
##               logl -485929.72
##  unrestricted.logl -484280.22
##      baseline.logl -532672.39
##               nvar         37
##               nobs       1480
##               npar        354
##                 df       1126
##          objective      27.79
##              chisq    3298.99
##             pvalue        ~ 0
##     baseline.chisq   96784.32
##        baseline.df       1332
##    baseline.pvalue        ~ 0
##                nfi       0.97
##               pnfi       0.82
##                tli       0.97
##               nnfi       0.97
##                rfi       0.96
##                ifi       0.98
##                rni       0.98
##                cfi       0.98
##              rmsea      0.020
##     rmsea.ci.lower      0.019
##     rmsea.ci.upper      0.020
##       rmsea.pvalue          1
##             aic.ll  972567.43
##            aic.ll2  972593.10
##              aic.x    1046.99
##             aic.x2    4006.99
##                bic  975125.03
##               bic2  974000.06
##            ebic.25  976403.29
##             ebic.5  977681.56
##            ebic.75  978704.17
##              ebic1  980238.09

free.model.adhd = comb.data %>% dplyr::select(DEP.2:HYP.5, ADHD, group) %>% 
  ggm(., estimator = "FIML", omega = adhd.structure, groups = "group") %>% runmodel # run a model where edges can vary 

knitr::kable(compare("free" = free.model.adhd,
"constrained" = comb.model.adhd), row.names = FALSE)

model	DF	AIC	BIC	RMSEA	Chisq	Chisq_diff	DF_diff	p_value
free	920	971926.3	975972.2	0.0168545	2245.815	NA	NA	NA
constrained	1126	972567.4	975125.0	0.0195041	3298.985	1053.17	206	0

Model 3

Model 3 tests the significance of all edges connected to the polygenic score for ADHD at the same time. Here, I compare a model where all edges between the polygenic score for ADHD are set to zero to the original network model in a combined dataset of TEDS and ALSPAC.

# set ALL edges connecting the PGS to zero
adhd.structure.3 = adhd.structure
adhd.structure.3[37,] = adhd.structure.3[,37] = 0

adhd.model.3 = comb.data %>% dplyr::select(DEP.2:HYP.5, ADHD, group) %>% 
  ggm(., estimator = "FIML", omega = adhd.structure.3, groups = "group") %>% 
  groupequal("omega") %>% runmodel
  
knitr::kable(compare("original network" = comb.model.adhd,
"network with no pgs edges" = adhd.model.3), row.names = FALSE)

model	DF	AIC	BIC	RMSEA	Chisq	Chisq_diff	DF_diff	p_value
original network	1126	972567.4	975125	0.0195041	3298.985	NA	NA	NA
network with no pgs edges	1128	972701.9	975245	0.0200892	3437.405	138.4199	2	0

Model 4

Model 4 tests the significance of a single edge connected to the ADHD polygenic score at a time, for all edges connected to the polygenic score. Here, for every edge connected to the ADHD polygenic score in the original network (edges connecting nodes HYP.3 and COND.4), I create a model where they are set to zero with a function, in a combined dataset of TEDS and ALSPAC. I then compare both models to the original network model.

# Function to set ONE edge connecting the PGS to zero progressively
model4 = function(alspac.structure,node1,node2) {
  
  # alspac.structure = structure of alspac network
  # node 1 and node 2 = nodes of interest as strings; 
  # node 1 is the PGS

  one = grep(paste(node1), colnames(alspac.structure)) # index of node 1 (PGS)
  nodes = which(alspac.structure[,one] ==1) # list of edges connecting node 1
  two = nodes[which(names(nodes) == paste(node2))] # index of node 2
  
  new.structure = alspac.structure # duplicate the original alspac structure
  new.structure[one,two] = new.structure[two,one] = 0 # set the edge between node 1 and 2 to zero
  model = comb.data %>% dplyr::select(DEP.2:HYP.5, paste(node1),group) %>% 
  ggm(., estimator = "FIML", omega = new.structure, groups = "group") %>% 
  groupequal("omega") %>% runmodel # create and run the model
  return(model)
}

## HYP.3
adhd.model.4.HYP = model4(adhd.structure,"ADHD", "HYP.3")
knitr::kable(compare("original network" = comb.model.adhd,
"without edge adhd-hyp3" = adhd.model.4.HYP), row.names = FALSE)

model	DF	AIC	BIC	RMSEA	Chisq	Chisq_diff	DF_diff	p_value
original network	1126	972567.4	975125.0	0.0195041	3298.985	NA	NA	NA
without edge adhd-hyp3	1127	972650.6	975200.9	0.0198693	3384.125	85.13986	1	0

## COND.4
adhd.model.4.COND= model4(adhd.structure,"ADHD", "COND.4")
knitr::kable(compare("original network" = comb.model.adhd,
"without edge adhd-cond4" = adhd.model.4.COND), row.names = FALSE)

model	DF	AIC	BIC	RMSEA	Chisq	Chisq_diff	DF_diff	p_value
original network	1126	972567.4	975125	0.0195041	3298.985	NA	NA	NA
without edge adhd-cond4	1127	972591.6	975142	0.0196081	3325.169	26.18379	1	3e-07

Model 5

Lastly, model 5 tests whether ALSPAC and TEDS have comparable edge weights in a single edge connected to the ADHD polygenic score. Here, I create model where equality constrains are lifted between the edges connecting the polygenic score to nodes HYP.3 and COND.4, and compare them to a model where they are set to be equal.

model5 = function(alspac.structure,node1,node2,eq.model) {
  
  # alspac.structure = structure of alspac network
  # node 1 and node 2 = nodes of interest as strings; 
  # node 1 is the PGS; 
  # eq.model = model with equality constrains to modify
  
  one = grep(paste(node1), colnames(alspac.structure)) # index of node 1 (PGS)
  nodes = which(alspac.structure[,one] ==1) # list of edges connecting node 1
  two = nodes[which(names(nodes) == paste(node2))] # index of node 2
  model = eq.model %>% 
  groupfree("omega",one,two) %>% 
  runmodel() # modify equality constrains model, lifting one edge constrain
  return(model)
}

# set edges connecting the PGS to free to vary, lift equality constrains
adhd.model.5.HYP <- model5(adhd.structure, "ADHD", "HYP.3", comb.model.adhd)
knitr::kable(compare("original network" = comb.model.adhd, 
"free edge adhd-hyp3" = adhd.model.5.HYP), row.names = FALSE)

model	DF	AIC	BIC	RMSEA	Chisq	Chisq_diff	DF_diff	p_value
free edge adhd-hyp3	1125	972569.3	975134.1	0.0195166	3298.820	NA	NA	NA
original network	1126	972567.4	975125.0	0.0195041	3298.985	0.1651108	1	0.6844941

adhd.model.5.COND <- model5(adhd.structure, "ADHD", "COND.4", comb.model.adhd)
knitr::kable(compare("original network" = comb.model.adhd, 
"free edge adhd-cond4" = adhd.model.5.COND), row.names = FALSE)

model	DF	AIC	BIC	RMSEA	Chisq	Chisq_diff	DF_diff	p_value
free edge adhd-cond4	1125	972569.1	975133.9	0.0195156	3298.608	NA	NA	NA
original network	1126	972567.4	975125.0	0.0195041	3298.985	0.3777227	1	0.5388245

Plotting the network focusing on the polygenic score

Here, I plot the network focusing on the polygenic score by placing it in the centre of the graph, using custom functions which can be found here.

# load networks and legend ----
quest.sub = read_excel("mfq_sdq_key_11_subscale.xlsx")
quest.sub = quest.sub[-c(1,4),]  # removing items not in teds and in these reduced networks

# load functions ----
source("qgraph.alt.R") # modified qgraph function: plots by avoiding highlighted edges crossing under other ones
source("support.functions.R") # support functions to qgraph.alt
source("unfade.centre.nolegend.R") # unfading function

# load average layout with PGS in the centre ----
avglay = as.matrix(read.csv("avglay.csv", 
                            header = FALSE))

node.one = which(colnames(adhd.network$graph) == "ADHD.ld")
node.two = which(colnames(adhd.network$graph) == "COND.4")
curve.adhd = matrix(0, ncol = 37, nrow = 37)
curve.adhd[node.one, node.two] = curve.adhd[node.two, node.one] = 0.7
graph = unfade.nolegend(adhd.network$graph, PGSnode = "ADHD", PGSdesc = "Polygenic Score for ADHD - ALSPAC", PGScolor = "#46BAC8", 
avglay = avglay, curve.matrix = curve.adhd)
plot(graph)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
pipeline_example_files/figure-html		pipeline_example_files/figure-html
.gitignore		.gitignore
LDPred_ADHD.R		LDPred_ADHD.R
README.md		README.md
Unweaving the polygenic web pipeline.Rproj		Unweaving the polygenic web pipeline.Rproj
adhd-net.png		adhd-net.png
avglay.csv		avglay.csv
mfq_sdq_key_11_subscale.xlsx		mfq_sdq_key_11_subscale.xlsx
pipeline_example.RData		pipeline_example.RData
pipeline_example.Rmd		pipeline_example.Rmd
pipeline_example.html		pipeline_example.html
qgraph.alt.R		qgraph.alt.R
support.functions.R		support.functions.R
unfade.centre.nolegend.R		unfade.centre.nolegend.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example of analysis pipeline - Unweaving the polygenic web

Load packages and data

Estimate network in ALSPAC

Plot network

Non-parametric bootstraps

Confirmatory Network Analysis

Model 1

Model 2

Model 3

Model 4

Model 5

Plotting the network focusing on the polygenic score

About

Languages

giuliapiazza18/Unweaving-the-polygenic-web-pipeline

Folders and files

Latest commit

History

Repository files navigation

Example of analysis pipeline - Unweaving the polygenic web

Load packages and data

Estimate network in ALSPAC

Plot network

Non-parametric bootstraps

Confirmatory Network Analysis

Model 1

Model 2

Model 3

Model 4

Model 5

Plotting the network focusing on the polygenic score

About

Resources

Stars

Watchers

Forks

Languages