TCGA Biomarkers Identification using Machine Learning | Complete Walkthrough

preview_player
Показать описание
Well, mostly doing this since people have been asking to connect the database with some basic machine learning script , so I might as well capitalized on this. Anyhow, I mostly wrote this with the mindset on education and not really on research so the code was keep as simple as possible. Despite the high accuracy, i don't think the markers identified from this script is going to be siper useful but i think if someone can try to run a deseq2 or limma on the dataset and compaerd the results, I would love to heard from that

Slides used

Script (look for TCGA_biomarkers)

Chapters:
0:00 Introduction and background
5:00 Chapter 1 - Installing packages and importing libraries
7:20 Using TCGA Biolinks
12:50 Structuring Input data and filtering
17:36 PlotMDS from limma and edgeR
18:19 Normalization of data
21:22 PCA Analysis
22:05 Making Train Label and One -hot Encoding
25:04 Chapter 2 - Neural network construction
30:53 Neural networking Training model fitting
33:31 Saving Model as hdf5 files
34:41 Extraction weights and bias
37:02 Extraction of GOI using weights and bias
41:06 Chapter 3 - Gene set enrichment analysis
44:53 Results!!!!!!
47:15 Some major issues with this approach
Рекомендации по теме
Комментарии
Автор

Great presentation!

Much appreciated!👍

Muuip
Автор

Thanks again for previous help! One question I'm having issue with. I used legacy data, so the GDCprepare function did not work for my query. I was able to design a workaround where I created a count matrix manually, with the metadata in a separate table. How would I go about assigning training labels to data that's not in SummarizedExperiment format? I.e. how would I replicated the below code by using two separate data frames - one for expression data and another for metadata:

train_label <- train_label %>% as.factor() %>% as.numeric ()
train_label <- train_label - 1
dim(train_label) <- c(dim(expr_filtered)[2], 1)

rk
Автор

Hi, Thank you so much. This is really helpful

mangalahegde
Автор

I was wondering if this could be used in case of 16S amplicon sequencing data. Could you please enlighten me?

md.ishtiakrashid
Автор

Time series on binomial outcomes!?

Ex.: gene expression comparison by death versus survival over 30 days!?
🤔

Muuip
Автор

Hi I'm having a hard time understanding about finding goi. Can you tell me about the theory of finding good nodes (gene of interst) by summing total weight and bias?

vngplex
Автор

Hi,
This is absolutely a great video!!
Please, I did not understand your explanation in code line 70: about removing "Metastatic". I am doing similar ML project with "TCGA-BLCA" i.e Bladder cancer but not with ANN. my sample group is "Primary Tumor, Solid Tissue Normal". I was thinking of how to implement code line 70, but, I actually don't get it.
could you brief me in line with my Project?

regards,

anthonyimhenkuomon
Автор

Hi there! Awesome video - super cool stuff! I ran the code

dge <- DGEList(t(expr_filtered))

and an error message popped up saying : 'Error in plot.xy(xy, type, ...) : invalid color name 'Primary Tumor'' - any clue as to what might have gone wrong and how I can fix? Thank you so much!

RanjeetSingh-xwlr
Автор

you did not tel me how to select project id because manifest file downloading is tricky you firs need to tel that

MadihaHameedAwan
Автор

hi I tried your code especially for PCA visualization but my graph would not show. It says table of extent 0>. How do I solve this? I am totally new in r. Thank you

rutchristine