-
Notifications
You must be signed in to change notification settings - Fork 0
/
main.Rmd
677 lines (520 loc) · 25 KB
/
main.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
---
author: Benjamin Brachi
date: October, 2017
title: Swedish project
output:
html_document:
keep_md: true
fig_caption: yes
toc: true
toc_float: true
number_sections: true
highlight: tango
theme: space
---
```{r eval=T, echo=F}
library(knitr)
library(captioner)
table_nums <- captioner(prefix = "Table")
fig_nums <- captioner(prefix = "Fig.")
```
```{sh convim, eval=T, echo=F}
I=$(ls ./figures/*.pdf)
for i in $I
do
i2=$(echo $i | sed 's/.pdf/.png/g')
if [ ! -f $i2 ]; then
convert -quality 99 -density 300 $i $i2
fi
done
```
# Complete protocol of the experiments
## Location of the experiments
We performed a total of 8 experiments, in 4 locations in Sweden over 2 years.
```{r rcmakemap, eval=T, echo=F}
read_chunk("./scripts/make_map.R")
```
```{r makemap, eval=F, echo=T}
<<map>>
```
This doesn't work anymore, but I made the map before so I'll just use that for now.
![Map of experiments and origin of accessions used in the experiments. Red dots: experiments; Blue dots: accessions](./figures/map.png)
## planting and setup
The list of the 203 accessions/genotypes used is presented in acc\_list.txt.
This list includes 200 re-sequenced Swedish accessions, Edi-0,
Col-Fri and Col-Fri-FLC. Most Swedish accessions were planted in 8
replicates per block and Edi-0, Col-FRI and Col-FRI-FLC, 6043 Lov-1,
6974 Ull2-5, 7517 Var2-6, 8369 Rev-1, 8240 Kulturen-1, 8262 Bil-5,
8247 San-2, 6918 Fab-4 were planted in 16 replicates per block.
Each experiment is organized in a three complete randomized block
design. Plantings followed the calendar presented in table
\ref{calendar}.
\begin{table}
\center
\label{calendar}
\begin{tabular}{|l|l|l|l|l|}
\hline
Year & Experiment & Block & planting date & field installation date \\
\hline
2011 & Adal & A & 2011-08-08 & 2011-08-25 \\
& & B & 2011-08-10 & 2011-08-25 \\
& & C & 2011-08-12 & 2011-08-25 \\
& Ramsta & A & 2011-08-07 & 2011-08-24 \\
& & B & 2011-08-09 & 2011-08-24 \\
& & C & 2011-08-11 & 2011-08-24 \\
& Ullstorp & A & 2011-08-31 & 2011-09-17 \\
& & B & 2011-09-02 & 2011-09-17 \\
& & C & 2011-09-04 & 2011-09-17 \\
& Ratckegarden & A & 2011-09-01 & 2011-09-18 \\
& & B & 2011-09-03 & 2011-09-18 \\
& & C & 2011-09-05 & 2011-09-18 \\
\hline
2012 & Adal & A & 2012-08-08 & 2012-08-25 \\
& & B & 2012-08-10 & 2012-08-25 \\
& & C & 2012-08-12 & 2012-08-25 \\
& Ramsta & A & 2012-08-07 & 2012-08-24 \\
& & B & 2012-08-09 & 2012-08-24 \\
& & C & 2012-08-11 & 2012-08-24 \\
& Ullstorp & A & 2012-08-31 & 2012-09-17 \\
& & B & 2012-09-02 & 2012-09-17 \\
& & C & 2012-09-04 & 2012-09-17 \\
& Ratckegarden & A & 2012-09-01 & 2012-09-18 \\
& & B & 2012-09-03 & 2012-09-18 \\
& & C & 2012-09-05 & 2012-09-18 \\
\hline
\end{tabular}
\caption{Calendar followed for the planting of the common garden
experiments}
\end{table}
## Overview of the phenotypes collected and organisation.
To designate a particular round of experiment we'll use the year it
was sown (not the year it was harvested).
Table \ref{phenCG2011} and \ref{phenCG2012} provide a the list of the
phenotypes we have.
\begin{table}
\center
\caption{Table of the phenotypes we have collected (\textit{or will collect} indicated in italic) in the common garden experiments from 2011.}
\label{phenCG2011}
\begin{tabular}{|l|c|c|c|c|}
\hline
phenotype & ULL 2011 & RAT 2011 & RAM 2011 & ADA 2011 \\
\hline
flowering time before winter & X & X & X & X \\
flowering time in the spring & & & & \\
herbivore damage in the fall & & X & & \\
rosette size & X & X & & \\
overwinter survival & X & X & X & X \\
survival to seed set (Approx) & X & X & X & X \\
fecundity estimate & X & X & X & X \\
microbial community & \textit{X} & \textit{X} & \textit{X} & \textit{X} \\
\end{tabular}
\end{table}
\begin{table}
\center
\caption{Table of the phenotypes we have collected (\textit{or will
collect} indicated in italic) in the common garden experiments from
2012.}
\label{phenCG2012}
\begin{tabular}{|l|c|c|c|c|}
\hline
phenotype & ULL 2012 & RAT 2012 & RAM 2012 & ADA 2012 \\
\hline
flowering time before winter & X & X & X & X \\
flowering time in the spring & X & X & X & X \\
herbivore damage in the fall & & & & \\
rosette size & X & X & X & X \\
overwinter survival & X & X & X & X \\
survival to seed set (Approx) & X & X & X & X \\
fecundity estimate & \textit{X} & \textit{X} & \textit{X} & \textit{X} \\
microbial community & \textit{X} & \textit{X} & \textit{X} & \textit{X} \\
\end{tabular}
\end{table}
To be able to combine everything in one file, the easiest is probably
to start from the initial randomizations, corrected for errors that
were made during planting. Then we can add all the phenotypes in
different columns. Some phenotypes will have many NAs, because only a
subset of the plants were measured (/i.e./ microbiota, fitness).
# Merging all phenotypes in one file and description
The script merging\_phen\_files_11202014.R was used in April 2014 to
merge all phenotype files into one. I copied that script and updated
the paths. This scripts uses the results from the treatment of each
trait or set of traits fom the folders in "../all\_phenotypes/"
```{r mergephen, eval=T, echo=T}
read_chunk("./scripts/merging_phen_files.R")
```
```{r mergephen1, eval=T, echo=T}
<<phenmerge>>
```
This results in two files, one for each year. There are named
data\_2011.txt (or .R for the binary version) and data\_2012.txt (or
.R for the binary version).
The column names for each files are summarized below.
## columns in data2011
- exp: name of the experiment
- block: experimental block within the experiment
- tray: tray within the experiments
row: coordinate 1 of position of the plants on a tray (varies from 1
to 11)
- column: coordinate 2 of position of the plants on a tray (varies from 1
to 6)
- line: number from 1 to 203 designating the accessions planted
- id: accession id of the accession as refered to in the call\_method\_75
of the 250 KSNPs data.
- name: actual name of the accession (might be messed up by encoding,
use ids!!)
- planting\_date: the planting date. All plants of the same block within
the same experiment have the same date.
- errors: errors during planting (only use lines with ".")
- fall: score of survival,flowering, and pathogene infections the
survival code
- spring: score of survival,flowering, and pathogene infections the
survival code
- sampled: TRUE if the sample has been sample for microbial community
analysis in the spring.Otherwise FALSE.
- rosette\_date: Date at which the photographe that was use to make
rosette measurements was taken.
- area: rosette area on the photograph in cm2
- perimeter: rosette perimiter (cm)
- max\_diameter: maximum diameter of the plant.
- sdR: standard deviation of the plants radius
- circle\_area: area of a circle of diameter "max\_diameter"
- stockiness: measure of plant stockiness: (4*pi*area)/(perimeter)^2
- color: a measure of color variation from green to purple. lower values
are greener.
- herbivory: herbivore damage score from 0 to 3, 0: no damage, 3:
extensive damage.
- fecundity: area of occupied by the mature plant stems (number of
pixels of the image that are the plant, not the area of the bounding
box) as a proportion of the total number of pixel on the image.
## columns for data2012 (only columns that are different):
- epi: sampled by Fernando and Manu for RNASeq.
- PC\_sampled: TRUE if the sample has been sample for microbial community
analysis in the spring.Otherwise FALSE.
- flowering\_date: the date at which the plant was scored as flowered (if
reading from the text file, in R, as.Date() will turn it to a date
format.
- FT: time from planting to the date the plant was scored as
flowered.
## Add some derived phenotype columns and clean up
This next chunck of script does some cleaning up, and compute derive
phenotypes such as ow, sss, fitness (composite of fecundity and sss).
```{r deriv0, eval=T, echo=T}
read_chunk("./scripts/deriv_phen.R")
```
### rational
In Both 2011 and 2012 we have scored survival in the spring and we
have harvested stems at the end of the plants life cycle.
We also have information about flowering time before winter for both
year.
I'll try to create to survival related columns in the data. The first
is going to be overwinter survival (ows). For ows our data are
relatively precise because we have survival/flowering scores before
and after winter. The second survival phenotype is going to be
survival until seed set (sss). This is will include the plants that
survived over-winter, plus the plants we don't have stem flowering
data and stem photos for. This is likely to be a little
unprecise. First we are missing some stems, because some were
missing, we disgarded some while taking the photos when they were in
such a state a photo wasn't meaningful. However that's probably quite
rare over the number of stems we photographed. Second, some dead
plant might have "flown" away from the experiments before we
harvested them. That again should only be a small
fraction. Nevertheless, we should keep those limitations in mind.
### Over-winter survival (ows)
To considered a plant has survived winter (ows=1) it needs to have been alive in
the last sensus in the fall and be alive in the spring. Following table \ref{survivalscores},that means it must have a value of 1, 2, 4, or 5 in the fall and
spring column. Whether it was sampled or not doesn't matter in 2011
because the only sampling we did was in the spring after the
survival/flowering sensus was made. To be considered having not
survived winter, plants must have been present in the last sensus in
the fall, but be dead or absent in the spring.
For the 2012, data, it's basically the same except we beed to
take into account the epi genetics sampling (epi column). This
sampling happened in Oct or November.##
```{r eval=T, echo=F}
<<ows>>
```
### Survival until seed set (sss).
That I\'ll define as surviving until seed set (sss) as having survived
over winter and having data in the fecundity column. ##A0004##
They must not have been harvested for micorbiom
```{r eval=T, echo=T}
<<sss>>
```
### Checking if it makes sens.
I want to know how many of the plants that are scored as having died,
have produced a stem that we have (Table \ref{counts}). ##A0005##
Basically there are only a few plants died.
```{r eval=T, echo=T}
<<verifowssss>>
```
There are rosette and fitness data for a few wells that were supposed
to be empty. If there weren't empty in the fall, then a subset of them
got rosette data. It therefore looks like it's a combination of
scoring mistakes and a few contaminations. ##A0006##
Those lines will be removed from both datasets. ##A0007##
```{r eval=T, echo=T}
<<cleanowssss>>
```
### cleaning up rosette data
The number of data points is presented in table \ref{rosettes1}. ##A0008##
```{r eval=T, echo=T}
<<rosettes1>>
```
There shouldn't be rosette data if the fall sensus indicates 0, 3, or
5 (see table \ref{survivalscores}), however there are a few (table
\ref{rosettes2}). These can arise from errors in the sensus, or
errors in the image analysis. The most conservative way to deal with
that is probably to remove those lines. ##A0009##
```{r eval=T, echo=T}
<<rosettes2>>
```
I also added a column giving the age of the rosettes when photographed.
```{r eval=T, echo=T}
<<rosettesage>>
```
### adding a "fitness" trait.
This trait is going to correspond to the fecundity estimates, but the
NAs will be replaced by 0s when sss==0.
##A0011##
```{r eval=T, echo=T}
<<addfittrait>>
```
### dealing with the Controls.
For now I've decided to remove the controls from the dataset. I can
bring them in as needed. I'll save them in a seperate file.
I'll also remove all lines saying "empty" in the column line or
anything else than "." in the error column.
I'll also remove the none Swedish accessions with line numbers 201
(Edi), 202 (Col-FRI) and 203 (Col-FRI-FLC).
```{r eval=T, echo=T}
<<controlsandco>>
```
### save the clean datasets
```{r eval=T, echo=T}
<<savedatasets>>
```
# Compute heritabilities and blups per accession
```{r h20, eval=T, echo=T}
read_chunk("./scripts/heritability.R")
```
This is slow, because of the bootstrapping. Set to not run. Also for
now the number of bootstraps is set to 100, but in the final version
this needs to be parallelized and increased to 1000.
```{r h21, eval=F, echo=T}
<<herit>>
```
![Heritability of traits in the common garden experiments.](./figures/H2.png)
Replot this to focus on fitness and composite traits
```{r heritability plot}
###DLF 07July19
hs <- read.table("./res/H2_2011_2012.txt", sep="|",header=TRUE)
site.index <- read.table("./data/exp.sites.simple.index.txt")
colnames(site.index) <- c("experiment", "simple.name")
hs <- merge(hs, site.index, all.x=TRUE)
fit.hs <- hs[hs$trait%in% c("fecundity","ows", "sss","fitness"),]
fit.hs <- fit.hs[order(fit.hs$trait, fit.hs$year),]
library(lattice)
fit.hs$year <- as.factor(fit.hs$year)
fit.hs$trait <-gsub("ows","overwinter survival",fit.hs$trait)
fit.hs$trait <-gsub("sss","survival to seed set",fit.hs$trait)
pdf("./figures/heritability.barplot.pdf", width=6, height=4)
trellis.par.set(superpose.polygon = list(col = c("darkslateblue", "dodgerblue")))
barchart(H2~simple.name|trait, groups=year, data=fit.hs, auto.key=TRUE, ylab="heritability", xlab="experimental site")
dev.off()
```
In the next chunk we compute heritability estimates just for the fitness trait (again, this is slow due to bootstrapping so not evaluated in the markdown).
```{r h22, eval=F, echo=T}
<<heritfit>>
```
The next table gives the fitness heritability estimates and 95\% confidence intervals (from 1000 bootstrapts)
```{r heritfittable, eval=T, echo=F}
res=read.table("./res/H2_fitness_percent_rounded.txt", sep="|",h=T)
kable(res, caption="Broad sens Heritability estimates for fitness in each experiment")
```
We then need to compute "fitness in the North" and "fitness in the
South" for comparison with allele frequencies changes.
```{r h23, eval=T, echo=T}
<<h2fitNS>>
```
# Evidence for adaptation based on the common garden experiments
The idea here is to look for a relationship between the fitness
estimates in the North and South of Sweden and the latitude of origin
of the accessions.
```{r rcfitvslat, eval=T, echo=F}
read_chunk("./scripts/CG_fitvslat.R")
```
```{r fitvslat1, eval=T, echo=T, warning=F}
<<fitlat>>
```
The mean fitness traits were computed to remove the year, experiment and block effects
within regions by fitting a linear model to the fitness data, and then
to compute means of the residuals per accessions. This data is saved in the file "./res/means_NvsS.txt".
This yields the following figure.
![Fitness estimates from the common garden experiments display evidence for local adaptation across to successive years. A and B: Relationship between the latitude of origin of accessions and fitness in the North (blue) and in the South (red). C: Ditribution of the Spearman's rho rank correlation coefficients between latitude of origin and fitness in the North (blue) and in the South (red), over 1000 non-parametric bootstraps. The blue and red vertical lines delimite the 95% confidence interval for the correlation coefficients in the North and in the South, respectively. D. Genetic correlation between the fitness of accessions in the North (x axis) and in the South (y axis). The color gradient represents the latitude of origine of accessions.](./figures/fitness_N_S_lat.png)
_Conclusions:_
- There is a negative relationship between fitness and latitude of origin in the South, but not in the North, indicating conditional neutrality.
- Fitness in the North and in the South are positively correlated
- Plants tend to grow bigger and produce more seeds in the North
# Fitness estimates from the CG vs change in accession frequency in the NSE
```{r rcfitfreq, eval=T, echo=F}
read_chunk("./scripts/fit_vs_freq.R")
```
We need to determine is accessions that became frequent in all
experimental evolution experiments are also fitter in the common
gardens.
Daniele provided a file called "./data/accs.per.plot.Rdata" which as
the allele frequency of accessions in the experimental plots after two
winters.
I'll compare those values with the fitness means per accession and per region that used I when looking
for adaptation above.
```{r fitfreq, eval=T, echo=T}
<<ff1>>
```
This scripts yields the following figure.
![Accession frequency and common garden fitness estimates. The plots
are organized in 4 lines, one for each of the natural selection
experiments. For each NSE, the plot on the left shows the relationship
between accession fitness in the North (x) and accession fitness in
the South (y). Each point is an accession. Gold dots correspond to
accessions that were not sampled in the NSE. Blue dots are accessions
that were sampled in the NSE and their sizes reflect the frequency of
the accessions in the NSE. Red dots are accessions that were sampled
more than twice in all NSE. The diagonal line has a slope of 1 and
intercept of 0. The barplots on the right break down the frequency of
acccessions from each region in Sweden that were sampled in the NSE,
and those that were not.](./figures/fit_vs_overall_freq.png)
What this shows is that the fittest plants in the common gardens are
not always the ones sampled in the NSE experiments, suggesting that
important fitness components are not captured in the commong gardens
(germination and establishement most likely).
From the barplots, we can see that Northern accessions are more
frequent among sampled accessions than they are in the non-sampled
accessions. In the other experiments, including Barsta in the North,
it seems Northern accessions are at lower frequency in the sampled
accessions than in the non-sampled accessions.
# Genetic bases of adaptation
## GWA of common garden mean fitness estimates
Here we are going to investigates the genetics underlying the North and South fitness estimates.
```{r eval=T, echo=F}
read_chunk("./scripts/GWA_fitness_NvsS.R")
```
This first chunk prepare libraries, and loads data needed for mapping.
```{r gwafit1, eval=F, echo=T}
<<prepmap>>
```
Then we run lmm using gemma for the two traits seperatly.
```{r gwafit2, eval=F, echo=T}
<<lmm>>
```
We then read the outputs, make mahattan plots, compute pseudo heritability and retrieve genome wide signigificant SNPs.
We also retrieve the matrices of betas and their standard error for the next analysis.
```{r manhattan, eval=T, echo=F}
fig_nums("manhattan1", caption="Manhattan plot for fitness in the North")
include_graphics("./GWA/manhattans/lmm_gwa_N_fitness.jpeg")
fig_nums("manhattan2", caption="Manhattan plot for fitness in the South")
include_graphics("./GWA/manhattans/lmm_gwa_S_fitness.jpeg")
```
```{r gwafit3, eval=F, echo=T}
<<topsnpslmm>>
```
At a 0.05 fdr corrected pvalue threshold, there are no significantly associated SNPs.
To investigate which SNP effects might be significant we run mashr on betas and se.
```{r gwafit4, eval=F, echo=T}
<<mashrfitNvsS>>
```
This generates posterior effect size estimates for SNPs it deems significant using adaptive shrinkage.
```{r mashreff, eval=T, echo=F}
fig_nums("mashrposteff", caption="Heatmap of posterior effect sizes of associated SNPs in fitness in the North and in the South.")
include_graphics("./figures/heatmap_mashr_fit_NvsS.png")
```
This highlights the two main peaks we can see for the GWA of fitness in the South on chromosome 2 and 5.
The figure shows that those peaks have effects going in the same direction for both traits, which is consitent with the positive correlation we observe between fitness in the North and the South.
Then we run mlmm using gemma for the two traits simultaniously. We
also make a manhattan plot.
```{r gwafit5, eval=F, echo=T}
<<mlmm>>
```
```{r manhattan2, eval=T, echo=F}
fig_nums("manhattan3", caption="Manhattan plot for fitness in the North and South mapped in a multivariate mixed model")
include_graphics("./GWA/manhattans/lmm_gwa_N_fitness_S_fitness.jpeg")
```
# Investigate allele frequency changes from NSE
DF produced a file in which the allele frequency changes between
initial (sown) and Spring sampling on year two (with associated SE).
_We'll add details about methods later. _
# Comparing GWA top hits and NSE results
```{r mlmmvsaf, eval=T, echo=F}
read_chunk("./scripts/GWA_vs_NSE.R")
```
```{r eval=T, echo=F}
<<readaf>>
```
The above chunk reads in the table of allele frequency changes between
North and South and outputs a table which gives the proportions of
SNPs for which no more extreme allele frequency changes were observe
in 10,000 permutations.
Basically 7% of the SNPs show "significant" changes in allele frequency in both
regions, and 15% in each of the two regions.
```{r eval=F, echo=F}
<<readGWA>>
```
```{r eval=T, echo=T}
<<mlmmvsafchange>>
```
The top SNPs in the two trait GWA, of which most two region are also
associated in single trait GWAs have evolved in the allele frequency
experiment.
The significance of allele frequency changes was done based on 10000
permutations.
## define questions
- Do the GWAs top SNPs change in allele frequency
- Significant fdr significant SNPs evol? _looks like it but is there a way calculate enrichement?_
- Going a little lower in the list of SNP effects, can we see the relationships we observed before with changes in allele frequency?
# Get the matrix
The SNP matrix is the same used in the microbiota paper. The script
prep SNPs starts from the vcf files from Fernando, and produces
different formats, including plink and bimbam (for GWA).
Here we chose to use SNPs with MAF above 0.1, so I recomputed the SNP
matrices and used the prefix sweden\_200\_MAF10.
# Compute haplotype blocks
The SNPs are group into haplotype blocks (or LD blocks) using the
plink --block function. This has a bunch of settings which I'll detail
here:
- 'no-pheno-req' simply allows the function to run on all the
genotypes in the SNP file, without removing those without
phenotypes
- 'no-small-max-span' is a modifier that allows to limit the span of
blocks with very few SNPs. By default, a two snp group can't span
more than 20kb and a 3 snps block can span more than 30kb.
- --blocks-max-kb allows to limit the maximum size of a block. The
default is 200kb, but I've set it to 50kb.
- --blocks-min-maf determines to MAF threshold for SNPs considered in
making the blocks. Default is 0.05 but we've decided to work with
SNPs at 0.1.
- --blocks-strong-lowci and --blocks-strong-highci sets the CI limits
of Dprime for considering SNPs in strong LD. The default is 0.7 for the lowest
CI and at least 0.98 for the highest. I've left the default values.
- --blocks-recomb-highci sets the lowest high CI of dprime value to
considered historical recombination events.
- --blocks-inform-frac is the lowest percentage of informative SNPs in a
block. I used th default is 0.95.
Better description of the parameters can be found at
https://www.cog-genomics.org/plink/1.9/ld#blocks
The jobs can be started using "./scripts/compute\_LD\_blocks.sh"
# Start GWAs on blups
Here I need to run a GWA using mlmm on the Northern and Southern
fitness values. The fitness values for each region corrected for year
and block effects are have been calculated previously. I ran a two
trait GWA, which gave additive coefficients for each SNP for each
trait.
# The ideas on how to use that now are:
- Define the ancestral state of the SNPs and calculate the effects of
derived alleles to see in which quadrant of the plot they fall.
- investigate the frequency of the SNPs in the founder populations for
each region.
- Compare with allele frequency changes in the SR experiments
- Match SNPs in this study with regions that may be introgressed from
neandertal Arabidopsis.
- Match with methylation patterns
# Compute expected SNP frequencies based on the CG data.