Simple guide to GSEA and plotting in python

preview_player
Показать описание
I show you how to do and plot GSEA using predefined gene ontology gene sets as well as custom user input genes in python. This easily integrates into other differential expression or single-cell analysis pipelines in python and only requires a few lines of code.

Notebook:
Рекомендации по теме
Комментарии
Автор

Note on which genes to include:

In the video I include only significant DE genes. This is not best practice, but was faster and required less memory. You SHOULD include all genes that had reasonable expression. A good threshold might be genes with a base mean of greater than 100. If you do include all genes (which you should) you will likely need to decrease the default number of permutations in the gp.prerank command by adding this argument: permutation_num = 100

sanbomics
Автор

Is it possible to adjust the text font size etc., and/or remove the redundant FDR value when running on a single geneset of interest? My default text when plotting with gseapy is much blockier and often overlaps the graph in an ugly way.

EamonCoughlan-rdkk
Автор

Thanks tonz for your effort making these informative videos. Without your help, I wouldn't be able to learn such nice and easy way of making GSEA plots. I really like every of your video and every single second of it. Really appreciate your smart and clear explaination.

I have one question, is there anyway to plot GSEA with all the genesets listed at once?
And per se rate them in order of significance?

jhpa
Автор

Great Video! Can you explain the formula for Rank? why the log10(df.padj) should be negtive?

harryliu
Автор

hi! Where is the formula for rank from? Also great videos! :)

katarinavalentincic
Автор

Thank you for the great presentation.

I have an error in the very last code
no differences in front of these code

term_to_graph
> response to DNA damage stimulus (GO:0006974)'

gseaplot(pre_res.ranking, term = term_to_graph,
>
TypeError Traceback (most recent call last)
Cell In[122], line 1
----> 1 gseaplot(pre_res.ranking, term = term_to_graph,

TypeError: gseaplot() got multiple values for argument 'term'

I cannot solve this error. if possible, could you check this?

수박바아
Автор

Thanks, it is super useful. But when I run gseapy.prerank(....), got the error "No gene sets passed through filtering condition", even try different min_size and permutation_num. Do you know where is wrong?

cherhan
Автор

Thank you for this video and all of your other videos, they are very concise and educational.

In this video, you pre-filter the genes in your ranked list. According to my understanding, this should not be done. I think that the ranked list should include all the genes in the experiment that have any evidence of expression. In other words, if it is possible to calculate a ranking metric, the gene should be included. Can you comment on the decision to use only deferentially expressed genes?

I think this idea actually illustrates one of the strengths of GSEA in that it is possible to make biological interpretations in the absence of genes that pass thresholds that define differential expression.

I was also wondering about your ranking metric itself. You are using log2FoldChange*-log10(adjp) which is certainly reasonable. I normally use the stat column for this which is something like log2FoldChange/StdError. Could you comment on your choice of ranking metric? Thanks!

charliewhittaker
Автор

HI! Thank you for another great video! Can you also use gene sets from msigdb for this? I have them downloaded in visual code in a separate folder, I am working on a server. I wanted to use msigdb package in R with rpy2, but I have too many problems with setting up an environment, that's why I gave up on R.

katarinavalentincic
Автор

I need your help! :), when I run this: gseaplot(pre_res.ranking, term = term_to_graph, **pre_res.results[term_to_graph]), the error reports says TypeError: gseaplot() got multiple values for argument 'term'

harryliu