RNAseq tutorial – part 4 – Differential expression analysis with Deseq2

preview_player
Показать описание
Here I use Deseq2 to perform differential gene expression analysis. I used a count table as input and I output a table of significantly differentially expressed genes. I also show PCA and dispersion QC of the RNAseq data.

The output data can be further manipulated and explored in R, python, or excel. E.g., you can extract positively enriched genes and sort by log-fold change. You can also use the Ensemble identifiers in gene ontology analysis directly. However, in future videos I will show the conversion of Ensemble IDs to gene symbol and show how to create heatmaps and other useful figures.

The samples include normal human cell control and replicative senescence cells from NCBI accession GSE171663

Deseq2 citation:
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8. PMID: 25516281; PMCID: PMC4302049.
Рекомендации по теме
Комментарии
Автор

Thank you!!! This is the best DESeq2 tutorial so far. It's easy to follow and every step makes sense. I am sure many others are benefiting and will benefit from you! I hope you are having a wonderful day or night wherever you are! Thanks a lot!!!

khawa
Автор

Thanks mate, all the tutorials in this series have been top notch. They go at a great pace and I appreciate that you explain pretty much everything you're doing

SlugmaB
Автор

I found your video as a diamond in the pile of superb bro... Thanks a lot.

joyhoskeri
Автор

hey mate
Am getting this error
followed all your steps except the filter one
pls help
I have 5 columns in my matrix (M10, M11, M12, M3 and M5) and EMBSEL gene ids to it
The dds step is not working
Error in checkForExperimentalReplicates(object, modelMatrix) :

The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer

ParthShah-hcpw
Автор

Please tell how to collect sample data for GSE99816, how to know which sample is normal / diseased, Please help me sir.( i used geoquery but it didn't contain this information) please help

sanjaisrao
Автор

Hi, informative video. I want to know, how to deal with non-intergers data having decimal. The data matrix function doesn't work on such data set

ZahidHussain-xbit
Автор

Hi, Nice tutorial!!! Thank you so much. May I ask how to compare 3 or more groups with different sample sizes?

florawang
Автор

Thanks for this video. Please how can I reach out to you?

adekunleajiboye
Автор

Hello. I am getting an error. While running the DESeqDataSetFromMatrix function, an error pops up

Error in DESeqDataSet(se, design = design, ignoreRank) :
'design' should be a formula or a matrix

can you tell me how to solve this issue? My dataset consists of 8 columns (4 cancer+ 4 normal samples).

ragnulf_gamer
Автор

Thank you Mark for your all informative videos. Is there a way to produce RPKM/FPKM and TPM values from DESeq2 library and what’s the easiest way to obtain gene length?

ashwaqkhaled
Автор

Hi. Sir.
Thank you for your video.
Just quick question. Once we run DeSeq(dds) function, the generated results are based on normalized data? after you run "res = results(dds, contrast = c("condition", "S", "C"), you ve got 7 columns including log2foldchange. this log2foldchange is calculated based on normalized data? of course, we can get normalized data using estimaterSizeFactors(dds) followed by counts(dds, normalized=T). But, before this code, we just run dds function and then extract result.

freezingtolerance
Автор

Could you further clarify with regard to how you would pick a threshold for row sums. Not sure what you meant by "filter their end result by their mean".

adampassman
Автор

Informative video. Thanks
I have a query regarding data analysis if you could please help me in that. I have a data set for tumors that I downloaded from cancer data portal so now I have gene expression data and clinical data for both tumors. I want to compare the gene expression of both tumors but I am no getting from where I should start, how can I compare these tumors by using DESeq2. Please guide me. Thank you

munibabashir
Автор

Hi, could you pls help me on how to filter out only the protein coding genes? Thankyou.

rushonline
Автор

Hi thanks for the useful tutorial, how do we convert results (differential table) in to dds (DESeq output)? In a way we can apply the padj cut-off in the res -> dds -> vsdata. Or is there any other way to get padj cut-off applied dds? Thank you

anandhakumarchandran
Автор

Hi. I am really thankful for your videos. Atm i am in a pickle. I looked up results() function man page since i am a bit confused about this "contrast" argument. The confusion comes from the fact that i have 3 types of samples not just "s" and "c". Either the "contrast" has 1 vector with exactly 3 elements like in the video, (and here comes the confusion): or 2 vectors with names of the fold changes for the numerator, and names of the fold changes for the denominator. What are these? The 3rd option that contrast can contain is "a numeric contrast vector with one element for each element in resultsNames(object) (most general case". Should i use the 2nd or the 3rd option? and what these numerators and denominators mean here? Thank you really.

hatchet
Автор

what If I have 3 conditions instead of 2? When I try to run res <- results (and etc) I get an error saying " Error in checkContrast(contrast, resNames) : 'contrast', as a character vector of length 3, should have the form: contrast = c('factorName', 'numeratorLevel', 'denominatorLevel'), see the manual page of ?results for more information"

mirij
Автор

Help me
My coade: dds <- = counts, colData = coldata, design = ~condition)
and this error: Error in DESeqDataSet(se, design = design, ignoreRank) : some values in assay are not integers
Why?

MM-fjym
Автор

Hi there!
That's really the best Deseq2 tutorial I have seen so far, thank you very much!!
I have one question: I ran the first command that includes the header and row.names (row.names =1) but I get the following error message:
"Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed"
I read a lot of sites that suggest to null the row.names but that is not a good idea for my data.
Have you ever encountered this error? Do you have any recommendations?
Thanks in advance!

julieapostolou
Автор

Hi, thanks for this informative video on DESeq2. I have been stuck for a while with input data matrix before running DESeq2. I can see that my Gene identifier column automatically becomes the first column when I am arranging the condition and coldata (column data of my htseq readcounts) into matrix format. Can you suggest me how do I fix it? Thanks!

diyabhattacharya