Bioinformatics Project from Scratch - Drug Discovery Part 2 (Exploratory Data Analysis)

Показать описание

This video represents Part 2 in a multi-part video series on Bioinformatics Project from scratch. In this video, I will be showing you how to take the dataset from Part 1 and use the SMILES notation (representing the unique chemical structure of compounds) to compute molecular descriptors. The descriptors that we will be computing are the Lipinski's descriptors (molecular weight, LogP, number of hydrogen bond donors and number of hydrogen bond acceptors). Finally we will then perform exploratory data analysis by making simple box plots and scatter plots to discern differences of the active and inactive sets of compounds.

Recap of Part 1, I have shown you how to collect original dataset in biology that you can use in your Data Science Project. Particularly, I have demonstrated how to download and pre-process the biological activity data from the ChEMBL database. The dataset is comprised of compounds (molecules) that have been biologically tested for their activity towards target organism/protein of interest.

⭕ Code:

⭕ Playlist:
Check out our other videos in the following playlists.

⭕ Subscribe:
If you're new here, it would mean the world to me if you would consider subscribing to this channel.

⭕ Recommended Tools:
Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it!

⭕ Recommended Books:

⭕ Stock photos, graphics and videos used on this channel:

⭕ Follow us:

⭕ Disclaimer:
Recommended books and tools are affiliate links that gives me a portion of sales at no cost to you, which will contribute to the improvement of this channel's contents.

#dataprofessor #bioinformatics #drugdiscovery #drugdesign #chembl #cheminformatics #bioinformaticsproject #bioinformaticproject #drug #drugs #molecule #molecules #machinelearning #lecture #dataprofessor #bigdata #QSAR #QSPR #machinelearning #datascienceproject #randomforest #decisiontree #svm #neuralnet #neuralnetwork #supportvectormachine #python #learnpython #pythonprogramming #datascience #datamining #bigdata #datascienceworkshop #dataminingworkshop #dataminingtutorial #datasciencetutorial #ai #artificialintelligence #tutorial #dataanalytics #dataanalysis #machinelearningmodel

Рекомендации по теме

Комментарии

professor you really saved my life. I am biotech and I was desperate about my M.Sc. dissertation since all labs are still closed due to covid. Bless you

misganamengistu

Bro giving master thesis project in a golden plate, bless you prof

tiamat

Finally part 2! Great stuff. Loved the EDA!

KenJee_ds

If it were possible to like this video a thousand times. I would. Thank you so much, Data Professor.

abdulmujeebonawole

for anyone having the error when importing rdkit, just install it manually in a new cell and run !pip install rdkit, the run again the original cell for getting Chem

danielgiraldo

Thank you for a wonderful walk-through and great tips on how to use RDKit! Looking forward to similar educational videos in the realm of drug discovery!

ehg

Best hands on bioinformatics YT tutorial - Jing jing! ขอบคุณมาก! :)

krzheph

Good evening Data Professor, I am currently a student in Machine learning in Cameroon and as part of a project I would like to set up a model capable of predicting the behavior of proteins expressed by cancer cells when they are subjected to certain drugs. But I am a bit lost on the approach to adopt, I would like to please have some advice (What is the dataset to use?, the most appropriate models?, how to manage negative examples? etc)

petitmodel

Thank you, Professor, for making this excellent video to start to learn Drug Discovery.

I had a few questions regarding the Converting IC50 to pIC50 part. Could you direct me in a direction to study further?
1. You cap the value of IC50 before converting to pIC50. Will that affect the analyzing result? Or, just when the value is great enough, can we treat them as the same thing?
2. Why would we want to avoid negative values? How does It affect the analysis?

leowu

Thankyou for making such effort to create content like this. It means alot! . Love from India

sakshichaudhary

Great video, thanks for publishing this quality content! A question - I was curious as to whether this project would be considered chemoinformatics more than bioinformatics due to the focus on molecules and chemical descriptors?

PutaCaliente

The Difficult Concepts Made Easy by DataProfessor... Thank you, Sir.

aashishkatyal

Dear professor
I have one query regarding the statistical significance exhibited while checking the Logp values(mannwhitney test). if there is no difference in active and inactive compounds, what does that mean?

meenavinaykumar

Professor, I love your lecture. Thank you much from a Bangladeshi learner.

jubayerhossain

Such a great I am following this series sincerely.... I have a question, Is there any way that we can directly call the csv file from first colab file to the current colab file instead of downloading from the first file and then using in the other file.... ??
Thank you so much for giving so much knowledge to needy totally free of charge....

VLM

Very helpful video! One question. Where are the intermediate classes coming from? I do not see that in my data and I'm getting errors in my codes.

theodoreguo

From my experience, I remember if you don't apply log transformation then the scatter plot looks very skewed which makes us hard to see the pattern! So we apply log transformation so that the scatter plot is interpretable!

ImportData

Great Video, working my way through the series. Was just wondering, how come 'standard_value' >1x 10^8 are not just discarded? .

justindreyer

Thank you professor for your videos ..absolute blessing in resource limited setups where i work

dr.surajitdebnath

Thank you for this amazing tutorial. I am getting an error at removing the intermediate bioactivity class as:
AttributeError: 'DataFrame' object has no attribute 'bioactivity_class'
Could you please tell me how to solve this error?

priyankapawar

Bioinformatics Project from Scratch - Drug Discovery Part 2 (Exploratory Data Analysis)

Bioinformatics Project from Scratch - Drug Discovery Part 1 (Data Collection and Pre-Processing)

Python for Bioinformatics - Drug Discovery Using Machine Learning and Data Analysis

Bioinformatics Research Projects for Beginners: 7 Research Project for Freshers| BioIT Mini Projects

Genome bioinformatics: can you build expertise from scratch? | Lilit Nersisyan | TEDxYerevan

Bioinformatics Project from Scratch - Drug Discovery Part 2 (Exploratory Data Analysis)

Bioinformatics project ideas

bioinformatics ROADMAP + Q&A

How I learned bioinformatics from scratch

Bioinformatics Project from Scratch - Drug Discovery #6 (Deploy Model as Web App) | Streamlit #22

Bioinformatics Project from Scratch - Drug Discovery Part 3 (Dataset Preparation)

4 things you MUST do before STARTING your FIRST bioinformatics project | Genomics with Georgia

Bioinformatics Project from Scratch - Drug Discovery Part 5 (Compare Models)

Top 5 Project / Dissertation Topics For Bioinformatics Students #bioinformatics #project

Genes and geography -- a bioinformatics project

How To Design Bioinformatics Projects?

Don’t Do Bioinformatics/Data Science. Here is why #bioinformatics

Top Recommended Bioinformatics Projects 2024! #bioinformatics #skills #project

Bioinformatics Project from Scratch - Drug Discovery Part 4 (Model Building)

Day in My Life as a Quantum Computing Engineer!

Best Bioinformatics Project Ideas #bioinformatics #project #shorts

Build 12 Data Science Apps with Python and Streamlit - Full Course

Bioinformatics Pipelines for Beginners

Two Quick Bioinformatics Projects YOU Can do Right now with Saturn Cloud | Bioinformatics Tutorials

How to design a bioinformatics project on cancer. #bioinformatics #biotechnology #biology #genomics