Bioinformatics Project from Scratch - Drug Discovery Part 2 (Exploratory Data Analysis)

preview_player
Показать описание
This video represents Part 2 in a multi-part video series on Bioinformatics Project from scratch. In this video, I will be showing you how to take the dataset from Part 1 and use the SMILES notation (representing the unique chemical structure of compounds) to compute molecular descriptors. The descriptors that we will be computing are the Lipinski's descriptors (molecular weight, LogP, number of hydrogen bond donors and number of hydrogen bond acceptors). Finally we will then perform exploratory data analysis by making simple box plots and scatter plots to discern differences of the active and inactive sets of compounds.

Recap of Part 1, I have shown you how to collect original dataset in biology that you can use in your Data Science Project. Particularly, I have demonstrated how to download and pre-process the biological activity data from the ChEMBL database. The dataset is comprised of compounds (molecules) that have been biologically tested for their activity towards target organism/protein of interest.

⭕ Code:

⭕ Playlist:
Check out our other videos in the following playlists.

⭕ Subscribe:
If you're new here, it would mean the world to me if you would consider subscribing to this channel.

⭕ Recommended Tools:
Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it!

⭕ Recommended Books:

⭕ Stock photos, graphics and videos used on this channel:

⭕ Follow us:

⭕ Disclaimer:
Recommended books and tools are affiliate links that gives me a portion of sales at no cost to you, which will contribute to the improvement of this channel's contents.

#dataprofessor #bioinformatics #drugdiscovery #drugdesign #chembl #cheminformatics #bioinformaticsproject #bioinformaticproject #drug #drugs #molecule #molecules #machinelearning #lecture #dataprofessor #bigdata #QSAR #QSPR #machinelearning #datascienceproject #randomforest #decisiontree #svm #neuralnet #neuralnetwork #supportvectormachine #python #learnpython #pythonprogramming #datascience #datamining #bigdata #datascienceworkshop #dataminingworkshop #dataminingtutorial #datasciencetutorial #ai #artificialintelligence #tutorial #dataanalytics #dataanalysis #machinelearningmodel
Рекомендации по теме
Комментарии
Автор

professor you really saved my life. I am biotech and I was desperate about my M.Sc. dissertation since all labs are still closed due to covid. Bless you

misganamengistu
Автор

Bro giving master thesis project in a golden plate, bless you prof

tiamat
Автор

Finally part 2! Great stuff. Loved the EDA!

KenJee_ds
Автор

If it were possible to like this video a thousand times. I would. Thank you so much, Data Professor.

abdulmujeebonawole
Автор

for anyone having the error when importing rdkit, just install it manually in a new cell and run !pip install rdkit, the run again the original cell for getting Chem

danielgiraldo
Автор

Thank you for a wonderful walk-through and great tips on how to use RDKit! Looking forward to similar educational videos in the realm of drug discovery!

ehg
Автор

Best hands on bioinformatics YT tutorial - Jing jing! ขอบคุณมาก! :)

krzheph
Автор

Good evening Data Professor, I am currently a student in Machine learning in Cameroon and as part of a project I would like to set up a model capable of predicting the behavior of proteins expressed by cancer cells when they are subjected to certain drugs. But I am a bit lost on the approach to adopt, I would like to please have some advice (What is the dataset to use?, the most appropriate models?, how to manage negative examples? etc)

petitmodel
Автор

Thank you, Professor, for making this excellent video to start to learn Drug Discovery.

I had a few questions regarding the Converting IC50 to pIC50 part. Could you direct me in a direction to study further?
1. You cap the value of IC50 before converting to pIC50. Will that affect the analyzing result? Or, just when the value is great enough, can we treat them as the same thing?
2. Why would we want to avoid negative values? How does It affect the analysis?

leowu
Автор

Thankyou for making such effort to create content like this. It means alot! . Love from India

sakshichaudhary
Автор

Great video, thanks for publishing this quality content! A question - I was curious as to whether this project would be considered chemoinformatics more than bioinformatics due to the focus on molecules and chemical descriptors?

PutaCaliente
Автор

The Difficult Concepts Made Easy by DataProfessor... Thank you, Sir.

aashishkatyal
Автор

Dear professor
I have one query regarding the statistical significance exhibited while checking the Logp values(mannwhitney test). if there is no difference in active and inactive compounds, what does that mean?

meenavinaykumar
Автор

Professor, I love your lecture. Thank you much from a Bangladeshi learner.

jubayerhossain
Автор

Such a great I am following this series sincerely.... I have a question, Is there any way that we can directly call the csv file from first colab file to the current colab file instead of downloading from the first file and then using in the other file.... ??
Thank you so much for giving so much knowledge to needy totally free of charge....

VLM
Автор

Very helpful video! One question. Where are the intermediate classes coming from? I do not see that in my data and I'm getting errors in my codes.

theodoreguo
Автор

From my experience, I remember if you don't apply log transformation then the scatter plot looks very skewed which makes us hard to see the pattern! So we apply log transformation so that the scatter plot is interpretable!

ImportData
Автор

Great Video, working my way through the series. Was just wondering, how come 'standard_value' >1x 10^8 are not just discarded? .

justindreyer
Автор

Thank you professor for your videos ..absolute blessing in resource limited setups where i work

dr.surajitdebnath
Автор

Thank you for this amazing tutorial. I am getting an error at removing the intermediate bioactivity class as:
AttributeError: 'DataFrame' object has no attribute 'bioactivity_class'
Could you please tell me how to solve this error?

priyankapawar