Understanding VCF file | Variant Call Format Part 3/3

preview_player
Показать описание
Variant Call Format is a text file that contains information of the "Variants" between the references genome and the sample genome. It contains meta-information lines, a header line, and then data lines each containing information about a position in the genome. The format also has the ability to contain genotype information on samples for each position

It was used extensively during the 1000 human genome projects for GWAS analysis and was included in many bioinformatics research pipeline. Yet, most researchers are having problems understanding how this file can be read directly and be used in their analysis.

In this 3 part videos, I am going to go through the whole specification of the .vcf file format. The metadata section, data section, and some examples to check if your understanding matches mine.

Link to slides

Original specification file

Sample vcf

Рекомендации по теме
Комментарии
Автор

Amazing videos, crisp and worth the time invested watching.

Solution to KCQ 5 is as below:
POS : 2
REF : TCG
ALT1 : TG - {Deletion of C at position 3}
ALT2 : T - {Deletion of C and G at position 3 and 4 respectively}
ALT3 : TCAG - {Insertion of A at position 4, which pushes G from position 4 to position 5}

ibabiome
Автор

Thank you! The videos are super insightful and helpful! Keep up the awesome work!

helenabiasibettibrendler
Автор

Thanks a bunch! It helped me greatly in my work :)

ollelinux
Автор

Great explanation, would you explain how ref and alt alleles are assigned in a vcf file. Is it assigned on the basis of allele frequency? As in a larger population there may be different types of snps such as A, C, T, G, then how only one snp is assigned as Alt allele? Is it assigned on the basis of its frequency in the population? E.g In different individuals of a population, there may be many possible snps at a specific position such as A, T, C, G. So who can we know that which snp could be the Alt allele?

genomicsandbioinformatics
Автор

Hello. where can i check the answer? i did not find it :( .. thank you very much

arwabashanfar
Автор

Very helpful video, thank you!!  I am not really familiar with bioinformatics and in this part of my project, I am trying two compare two VCF files corresponding to the results of healthy tissue and tumor tissue. I want to compare these VCF files and remove their similarities. More specific I want to remove the information of the healthy tissue from the tumor one. Have you any suggestions on which tool I should use or any way that I can do my analysis? thank you in advance!

elenips
Автор

Thank you very much, Sir.
Can you share the bash script for genomic variant format (GT) conversion from 0/0, 0/1, 1/1, ./. to 0, 1, 2, and NA, from 0, 1, 2, and NA to letters/nucleotide bases (diploid form: AA, CC, GG, TT) or directly from 0/0, 0/1, 1/1, ./. to letters/nucleotide bases (diploid form) and vice versa. I think these are the backbone for any downstream data analysis, and I am also facing many problems related to those. The script for file form and genomic variance conversion (GT) may be also in R script or Python script. Waiting for your kind response.

HaileG-
Автор

Thanks for the videos. Very informative for my next job interview!

lmarkal
Автор

Hello, thank you for your efforts, it is well explained. For the C-->G, it is a transversion, because C and G do not belong to the same family ( C is pyrimidine and G is a purine ).

elhafafadoua
Автор

Hey Brandon -- could you update the slides link please?

kubectlgetpo
Автор

Why wouldn't example 3 (t=3:50) have the POS=3, REF=C, and ALT=CA instead? Wouldn't that be the same but more efficient?

Great set of videos by the way 👌

musicspinner
Автор

How we can split VCF files from single VCF file and how we convert single VCF file to pfam format and how we can use plinkseq ?

annuranagamechangeroflife
Автор

I am working with a vcf file and am looking for information on how missense and nonsense data is represented without going to the summary html. Could you point me in the right direction?

scottieteichmer
Автор

Hi! I was wondering if is it possible to create my own .vcf file? If it is, how do you create one? Because I have my genotypic data with SNP data but in .csv file. I need to convert it to vcf file to use it for GWAS. I do appreciate if you can answer this. Thank you.

jemimahbanganan
Автор

Nice video. Please keep going. Thank you very much.

ssssteve
Автор

Hello, I have a VCF file, but I am not sure how to open it. Do you have an advice?

RenanSantos-pxml
Автор

Nice lecture big thanks. Could you pls make a video on analysis using tablet

lekshmirk
Автор

Thank you so much! It is useful for me job interview~

YH_C
Автор

Example 5 mutation is called "Translocation"

chesterhung