StatQuest: PCA - Practical Tips

Показать описание

In it, I give practical advice about the need to scale your data, the need to center your data, and how many principal components you should expect to get.

For a complete index of all the StatQuest videos, check out:

If you'd like to support StatQuest, please consider...

...or...

...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...

...or just donating to StatQuest!

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:

0:00 Awesome song and introduction
0:47 Make sure the data are on the same scale
2:53 Make sure the data are centered
3:30 How to determine the number of principal components

#statquest #PCA #ML

Рекомендации по теме

Комментарии

This is a gold mine for Data Scientist, Data Engineer, ML/DL engineer. I can hardly think of anyone else that can teach the same concept more clearly.

buihung

Dear Josh. I had so much issues with stats as I am from a totally different background. Watching Ur videos helped me overcome my insecurities. Thank you so much.

geethanjalikannan

Josh's videos are so cool that I usually like them before watching.

caperucito

All prof in the world need to learn how to teach from you ! Thanks !

Jason-xett

The intro with you singing is so cute, made me smile...

iloveno

Your initial music always make me smile😂😂

shwetankagrawal

Thank you for all the amazing videos. I would be having a really hard time without them

bendiknyheim

Thank you so much for, basically, all your videos on PCA

jesusfranciscoquevedoosegu

This PCA series (step-by-step, practical tips, then R) is brilliant! I found them very helpful. Thank you for these great videos!

Would you be considering to do a series on factor analysis?

kby--b

Hey Josh thx so much for your videos... 3 quick questions:

1. 7:54 says "if there are fewer samples than variables, the number of samples puts an upper bound on the number of PCs with eigenvalues greater than 0", but in the example there, the number of samples is equal to the number of variables, not less. Should the statement be "if # of samples <= variables (...)" then the upper bound applies?

2. In the same section above, there are only 2 PCs for 3 samples. From the initial prediction it seems like there could only be 2 PCs since 3 points make a plane. Could there be 3 PCs for 3 samples, or is the sample upper bound always # of samples - 1?

3. To clarify, 7:05 says "since we only have 2 points we only have 1 PC" so I can have a single PC with a slope in a much higher dimension? Since this PC would be in R3, that's okay, correct

paulotarso

Hi, Josh, , I am a little confusing that at 2:37, you mentioned using standard deviation, well, if we have math scores(0-100) with standard deviation of 5 and, in the same time, the reading scores(0-10) also has sd of 5, then by dividing sd, math and reading are still NOT in the same scale.

Patrick

Thanks for the video, but I think there is a simple mistake at @2:08 when you said mix 0.77 Math with 0.77 Reading, I thought that both must add up to 1, or I got something wrong ?

mostafael-tager

Great Video Josh!

I am wondering @ 7:32 "Find the line perpendicular to PC1 that fits best" what does this means?
I mean either you can have line perpendicular or a best fit line.

sane

Thank you so much for switching to Math and reading, cause the genes and cells things were giving headaches. Nevertheless; Thank you so much for your efforts ♥♥

boultifnidhal

Very nice videos. Have you considered a segment on kernel PCA?

johnfinn

I have a trivial question at 1:39 . If the recipe to make PC1 is using approx 10 parts Math and only 1 part reading, why does that mean that Math is '10' times more important than Reading to explain the variation in data? I mean I understand that it will be more important but is that specific number (10) correct?

doubletoned

Hi Josh. Thanks for your videos, especially when you are diving into details and tips.
In tip#2 concerning centering, you show 2 sets of 3 points and you present the centering to the mean. Let's imagine an experiment with 3 patients with drug A and 3 patients with drugs A and B. Let's say the lower/left set if the reference, drug A, and the upper/right set is the test, drug A+B. What about centering on A (set A will be at the origin)? This centering should show the total effect of adding drug B to drug A, whereas the mean centering shows half the effect. In the same vein, the variables plot should show the variables that change from drugA set to drugAB set instead of showing variables that change from the mean experiment ie ((drugA+drugAB)/2). What's your view?

samggfr

Hello Josh. @ 7:57, you explained that if there are fewer samples than variables then the number of samples puts an upper bound on the number of PCs. In the last example, there are 3 samples and 3 variables (therefore the number of samples isn't fewer than the number of variables), and the number of PCs should be 3 (not 2). could you explain why did you decide that the number of PCs should be 2!!. (BTW I watched all of your videos about PCA, but I don't understand this specific example).

basharabdulrazeq

At 6:19, even the two points are on a line, but does the line necessarily go through (0, 0)? If not, there still can be two PCs. Can you help clarify? Thanks.

mrweisu

Josh, Thank you very much for helping us out with stats. When i get a job, I sure should contribute towards your efforts.
I am struggling to understand things @3:10
Why should it be a problem if we do NOT centre the data ?
Can you please explain with respect to your "PCA -Clearly Explained" Video. My Prof would't answer it. So asking a Cool-Stat-Guru about it :)
If it requires too much eleboration please point me to other resources.... Thanks Again.
Best Wishes from India... :)

kushaltm

StatQuest: PCA - Practical Tips

StatQuest: PCA - Practical Tips

StatQuest: PCA main ideas in only 5 minutes!!!

StatQuest: Principal Component Analysis (PCA), Step-by-Step

StatQuest: PCA in R

StatQuest: PCA in Python

Principal Component Analysis (PCA) - easy and practical explanation

Principal Component Analysis (PCA)

Josh Starmer - On how to make technical learning more fun #shorts

PCA: key steps in practice

Andrew Ng's Secret to Mastering Machine Learning - Part 1 #shorts

5) PCA summary

Genomics exercise part 1 of 2 | PCA | Practical genomics

Transform Your Analysis: Dive into PCA Compression for Data

StatQuest: MDS and PCoA

Principal Component Analysis Easy Tutorial #1

Principal Component Analysis (PCA): With Practical Example in Minitab

PCA 18: When principal components fail

19. Principal Component Analysis

Support Vector Machine (SVM) in 2 minutes

Contrastive PCA: Applications & Results

PCA Advice

StatQuest: t-SNE, Clearly Explained

BroadE: Hail - Practical 3: Principal Component Analysis (PCA) and Deciphering Ancestry

PCA : the math - step-by-step with a simple example