Understanding File Formats in Bioinformatics: VCF and gVCF

preview_player
Показать описание
This is a quick video going over a very commonly used file format while performing variant calling analysis - VCF file. In this video, I will go over various fields in a VCF file while taking a look at an example VCF, understanding how the data is organized and what information do various fields store. In addition, I explain what are genotypes, difference between phased and unphased genotype, how to calculate alternate allele frequency and look at how DNA variations are recorded. Lastly, I also discuss what is a gVCF file and in what ways a gVCF file differs from a VCF file.
I hope you find this video helpful! Leave your thoughts in the comment section below!

FASTA/FASTQ format:

SAM/BAM file format:

Chapters:
0:00 Intro
0:40 What is a VCF file and how is it generated?
2:38 Main sections of a VCF file
3:27 Metadata section
5:51 Header line
6:51 Data lines - description of fields
13:13 Genes and alleles
14:30 Understanding genotype
15:33 What does genotype 2/0 or 1/2 mean?
17:02 Difference between GT:0/1 and GT:0|1 - phased vs unphased genotype
10:05 How are variants recorded in a VCF file?
22:01 Interpreting a record in VCF
24:45 Genomic VCF (gVCF)

Like the videos I create? Show your support and encouragement by buying me a coffee:

To get in touch:

#bioinformagician #bioinformatics #vcf #gvcf #gatk #haplotype #alleles #variantcalling #geneticvariants #mutations #gff3 #gff #gtf #sam #bam #phred #fasta #fastq #singlecell #10X #ensembl #biomart #annotationdbi #annotables #affymetrix #microarray #affy #ncbi #genomics #beginners #tutorial #howto #omics #research #biology #GEO #rnaseq #ngs
Рекомендации по теме
Комментарии
Автор

I am a bioinformatics student, just began my studies and I have really learnt a lot from your content 😊

magdalineakinyi
Автор

I am beginner in bioinformatics field I have not learnt these things from my institute as compared to what amazing things I have learnt from your channel thank you so much !

abubakarraja
Автор

Quite explicit explanation and detailed and very chronologically arranged. Looking forward to learn in subsequent lessons

mosesbaraza
Автор

Really really and informative video for the beginners. At 16:40 the position 491520 where the GT is 1/2, there shouldn't be C/CAC instead of CAC/C?

humarafique
Автор

Am I glad I found this channel. Great stuff!

hubijohn
Автор

Excellent video! I'm in love with your channel!! Congratulations!! I'm starting in this world of bioinformatics, and your videos have helped me a lot! Thank you!

isadoramachadoghilardi
Автор

I have been blessed by your videos. Thank you.

josephinecudjoe
Автор

Thank you so much for elaborating this. I can't relate the definition of Allele Frequency that you mentioned here for rows 2 and 3 in your sample (at 23:44 minutes). Can you please explain it for those?

faezedarbaniyan
Автор

Such a great lecture! I am just wondering if there is a typo at 17:00, the second row of the table at 332470 position. It has to be C/T not C/A or is there anything I missed?

설동헌-id
Автор

I love your channel!! Your content is so well organized, thank you so much!

seetarajpara
Автор

16:59 - 332470 - shouldn't that be CT or TC - since, for that position, T is reference allele (0) and C is 1st alternate allele (1) - how did you get C/A?

anmolpardeshi
Автор

Had always been looking for such a video. Thank you so much :D

Tekofilic
Автор

Thank you so much for sharing this information and your knowledge! Very much appreciated. Could you please make a video on doing a joint variant calling? And also, what you would do for joint calling on rna-seq data?

alexandrakassis
Автор

what was the name of the forum mentioned?

notterboutuyer
Автор

OMG such a good video!!! You can explain everything so amazingly ❤ Could you please one day make a tutorial about data set integration on Seurat, as 10X genomic and Smart-seq2 integration??? Thank you!!

giovannapg
Автор

Really informative tutorial. Could you please make a video on TMB and MSI ?

tapanbaral
Автор

If i have inserted the part of the same genome in a genome how can i find it

AshishKumar-elsb
Автор

Absolutely fantastic video! Thank you! Does a gVCF always respect the VCF format or is there a distinct gVCF format? Can you tell us more about the multi-sample VCF formats jVCF and MSVCF? Thanks!

biomagician
Автор

exciting video. Could you upload another video about how to analyze data using VCF tools in a Linux environment

abebemisganaw
Автор

Where can I find your power points you use in your videos?

alexandrakassis