Pipeline architecture and function¶
This pipeline implements the GATK’s best practices for germline variant calling in Whole Genome and Whole Exome Next Generation Sequencing datasets, given a cohort of samples.
This pipeline was disigned for GATK 3.X, which include the following stages:
- Map to the reference genome
- Mark duplicates
- Perform indel realignment and/or base recalibration (BQSR)*
- Call variants on each sample
- Perform joint genotyping
* The indel realignment step was recommended in GATK best practices <3.6).
Additionally, this workflow provides the option to split the aligned reads by chromosome before calling variants, which often speeds up performance when analyzing WGS data.
An overview of the Workflow architecture is depicted in Figure 1 below