Have questions? Visit https://www.reddit.com/r/SNPedia

VCF

From SNPedia

Promethease can read VCF (Variant Call Format), but there is a lot of flexibility in the VCF format.

Ideally you'll be able to produce a version 4.2 compliant VCF, with the END= fields set. This is sometimes known as a gVCF. That will be allow us to distinguish positions which match the reference from positions which were not callable due to insufficient sequencing depth. This will provide the best possible Promethease report. Alternatively you could use GATK with --EMIT-ALL-SITES which produces a MUCH larger VCF file, that also allows us to know reference vs missing.

About gVCFs[edit]

gVCF stands for "genomic VCF". It includes positions that match the reference and the qualities, so that Promethease can tell whether a position is missing or definitively not a mutation.

The "mpileup" command in "bcftools" (split from samtools) may be helpful in produce useful gvcfs. Please confirm if you try it.

bcftools mpileup -g 10 -uf /path/to/refgrch37.fa /path/to/a.sorted.bam

(the actual depth for -g will depend on the nature of your data. read the full documentation if you want to.)