Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data

preview_player
Показать описание
In this video, I will be showing you how to perform principal component analysis (PCA) in Python using the scikit-learn package. PCA represents a powerful learning approach that enables the analysis of high-dimensional data as well as reveal the contribution of descriptors in governing the distribution of data clusters. Particularly, we will be creating PCA scree plot, scores plot and loadings plot.

⭕ Playlist:
Check out our other videos in the following playlists.

⭕ Subscribe:
If you're new here, it would mean the world to me if you would consider subscribing to this channel.

⭕ Recommended Tools:
Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it!

⭕ Recommended Books:

⭕ Stock photos, graphics and videos used on this channel:

⭕ Follow us:

⭕ Disclaimer:
Recommended books and tools are affiliate links that gives me a portion of sales at no cost to you, which will contribute to the improvement of this channel's contents.

#dataprofessor #PCA #clustering #cluster #principalcomponentanalysis #scikit #scikitlearn #sklearn #prediction #jupyternotebook #jupyter #googlecolab #colaboratory #notebook #machinelearning #datascienceproject #randomforest #decisiontree #svm #neuralnet #neuralnetwork #supportvectormachine #python #learnpython #pythonprogramming #datascience #datamining #bigdata #datascienceworkshop #dataminingworkshop #dataminingtutorial #datasciencetutorial #ai #artificialintelligence #tutorial #dataanalytics #dataanalysis #factor #principalcomponent #principalcomponents #pc #machinelearningmodel
Рекомендации по теме
Комментарии
Автор

Anyone interested in a tutorial video on Bioinformatics (using Python)? Comments down below 👇
If you find value in this video, please give it a Like 👍 and Subscribe ❤️for more videos on Data Science. 😃

DataProfessor
Автор

I’m taking an online class from a brick and mortar school. This was part of this weeks “lecture”.

I have to say. This all seems thoughtful and very well presented .

If I was part of this program and new want you were talking about I bet it would be great 👍

tjf
Автор

Thanks, one question though - How come most tutorials conduct the main 'sklearn PCA' function before determining how may components to use in the PCA itself? Wouldn't it be better to determine the variance ratio between the components BEFORE choosing how many components to use (e.g. 1 or 2 or 3)? Isn't it a potential waste of time to start with a PCA, then find out in the scree plot afterwards that you should (could) have used more/less components? I think I'm missing something.

lucianb
Автор

Thank you for this video, I've been struggling with understanding PCA for a good minute, but your video explained it extremely well! Please keep posting more like this.

christopherreif
Автор

Hi professor, thank you so much for such an education tutorial, I have a dataset with shape (99, 25), how can I use PCA to select 10 or 8 best features that explain at least 90% variance of the dataset. In summary how I use PCA for feature selection, without transforming them into principal components ie PC1, PC2 I just need the features for further classification

chisomokwueze
Автор

I'd just like to remind people using pca to consider centering their data. Also consider both the variance and covariance pca

dr.merlot
Автор

Thank you data Professor, excellent intro! Suscribed, at once!
It is important to analize the correlation matrix to identify highly intercorrelated variables and then the loadings, in order to interpret semantically each component: the meaning of PC, PC2 and PC3

andreamarkos
Автор

This would've been more helpful if you explained how to determine the number of components. Because it seems like you just assumed it would be 3 because you knew there were three target labels (the three different species). If you didn't already have output/target labels and this was TRULY an unsupervised approach, it would have been useful to see how you arrive at 3 components.

bryan_truong
Автор

Hello Professor, I really want to appreciate the beautiful work you are doing on your channel. I have watched some of your videos and i will say the simplicity with which you deliver your lectures blows me away. I am trying to do a PCA on some some data which have npy file but i have got no luck to do that as i don't know how to go about it when using npy file for PCA. I will appreciate your help to guide me. Thank you.

adir
Автор

Interesting! I was just following an online course about PCA😄

marcyboy
Автор

hi, why not use the PCA function directly?

from sklearn.decomposition import PCA

pca = PCA(n_components=3)
pca.fit(X)

shobanaathiappan
Автор

How could we know that the components need to be only 3 in the starting
pca = PCA(n_components=3)

sushmithapulagam
Автор

How do you transform your own dataset into sklearns format?

nathaneasley
Автор

Very informative! Thanks for this important lesson 🙂 keep it up!

jeffersonjones
Автор

Insane. I love it. Please continue with this videos

lucaslessa
Автор

Hi sir, thank you very much for the video! I have experimental data set of time dependent signals 800(time)x49(signals) voltage values. I used PCA and reduced it to 800x2.How can I reduce further and extract information from these set for ML application and is there any other feature extraction method that you can advice for signal feature extraction ?

hakanbayer
Автор

can we also make it to plot 2d scatter plots of different combinations instead of 3d with a for loop in plotly or we need to use matplotlib for it ? also what could be the usage of pca for bioinformatics ?

username
Автор

I don't know why but all the codes are executed on my dataset but the final scree plot is not appearing even though the code has successfully executed but the blank output is coming.
What is the reason could be?

pateltapasvi
Автор

Hi Professor, where is the input file, the iris dataset? It it a csv file somewhere on you GIT? Peter

petelok
Автор

how do you use this method when in spectral data analysis? For example, they are 4 different samples and they have values for various wavelengths. How do I reduce the wavelength feature names?

ramaselvanathan