Understanding VCF file | Variant Call Format Part 2/3

preview_player
Показать описание
Variant Call Format is a text file that contains information of the "Variants" between the references genome and the sample genome. It contains meta-information lines, a header line, and then data lines each containing information about a position in the genome. The format also has the ability to contain genotype information on samples for each position

It was used extensively during the 1000 human genome projects for GWAS analysis and was included in many bioinformatics research pipeline. Yet, most researchers are having problems understanding how this file can be read directly and be used in their analysis.

In this 3 part videos, I am going to go through the whole specification of the .vcf file format. The metadata section, data section, and some examples to check if your understanding matches mine.

Link to slides

Original specification file

Sample vcf

Рекомендации по теме
Комментарии
Автор

Thanks for the video. It's clear and very helpful

rajaonarivelojeannearline
Автор

For me as a first-time VCF user, this was pretty helpful. Thanks :)

lebesgue-integral
Автор

The reason behind why the rsID doesn't match to dbSNP location at the video is that the reference used isn't GRCh38 or GRCh37 but the NCBI36.

saaralarmala
Автор

Is there something wrong with the REF and ALT in the example no 2, 3?
help please

drpallavijain
Автор

In total depth, the fact that each read doesn't give the same nucleotide has to do with the possibility of the sequencing to do some mistakes?

elenips
Автор

Helow, I have sequenced my targeted 18 gene of Mycobaterium Tuberculosis by extracted DNA.I have sequenced through miniseq illumina sequencing. Now I have the fastq files of my sequenced gene. So i want to know the expression level of each gene into individual sample. And from this fastq files I want to analyses the mutation profile and wanted to check the variant?? I wanted to perform that through linux.So how can I do that??

johirislam
Автор

In the example 1, why the third line is not 20 3 C CA . PASS DP=100?

jin
Автор

Very helpful video, thank you!! I am not really familiar with bioinformatics and in this part of my project, I am trying two compare two VCF files corresponding to the results of healthy tissue and tumor tissue. I want to compare these VCF files and remove their similarities. More specific I want to remove the information of the healthy tissue from the tumor one. Have you any suggestions on which tool I should use or any way that I can do my analysis? thank you in advance!

elenips
Автор

thanks a lot for the video. For the wrong position you found in dbSNP according ID, do you think it is because of GRCh38 rather than GRCh19?

caihongji
Автор

Thanks for the explanation - great channel!

matt
Автор

Thank you so so much.. Very helpful, everything explained very well

ashamerin
Автор

Thank you so much. It was very usefulll

MalahatDianat