Understanding VCF file | Variant Call Format Part 1/3

preview_player
Показать описание
Variant Call Format is a text file that contains information of the "Variants" between the references genome and the sample genome. It contains meta-information lines, a header line, and then data lines each containing information about a position in the genome. The format also has the ability to contain genotype information on samples for each position

It was used extensively during the 1000 human genome projects for GWAS analysis and was included in many bioinformatics research pipeline. Yet, most researchers are having problems understanding how this file can be read directly and be used in their analysis.

In this 3 part videos, I am going to go through the whole specification of the .vcf file format. The metadata section, data section, and some examples to check if your understanding matches mine.

Link to slides

Original specification file

Sample vcf

Рекомендации по теме
Комментарии
Автор

I've been lost trying to understand differences between FASTA, FASTQ, VCF, and CRAM for some days now, and I finally get it. thanks for this video!

maraoz
Автор

Really appreciate this series, thanks! Your descriptions are clear and easy to understand.

emetitiri
Автор

Very helpful video, thank you!!  I am not really familiar with bioinformatics and in this part of my project, I am trying two compare two VCF files corresponding to the results of healthy tissue and tumor tissue. I want to compare these VCF files and remove their similarities. More specific I want to remove the information of the healthy tissue from the tumor one. Have you any suggestions on which tool I should use or any way that I can do my analysis? thank you in advance!

elenips
Автор

Hello

Am currenly working on a theme: use of neural network to identify somatic variations, i would ask you if u have an idea from where can i get suitable dataset
Thank you

kadidyasmine
Автор

How to convert numerical GT format to letter or nucleotide GT format???

HaileG-
Автор

Any tutorial on how to parse a vcf file in python\ R?

taniadas
Автор

How about translocation? Translocation is another form of structural variation where non homologous chromosomes break apart and rejoin back to each other chromosomes by DNA repair machinery.

chesterhung
Автор

Hello
I had done the whole genome sequence by NGS (Illumina) of the bacteria. The files I received from the company are Fastaq1, Fastaq2, Filtered 1 Fastq Filtered 2 Fastq and analysis results file include :rmdup.bam.bai, rmdup.bam, filtered vcf and annotated vcf. Please tell me which of these files should be converted to FASTA and submitted to NCBI? Thanks

marwanmahmoodsaleh