Principal Component Analysis (PCA) [Matlab]

preview_player
Показать описание
This video describes how the singular value decomposition (SVD) can be used for principal component analysis (PCA) in Matlab.

These lectures follow Chapter 1 from: "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" by Brunton and Kutz

This video was produced at the University of Washington
Рекомендации по теме
Комментарии
Автор

for future ref) In the first part of the video, X's _colums_ (not rows) are each points(correction: 5:02 not every rows but every columns are X average).
And, note that the code is using 'svd' function, not 'pca' function.
This can be confusing because Prof. Brunton says in the previous lecture(the first vid on PCA) that PCA assumes 'rows' represent each individual(e.g. person, etc.), contrast to SVD which assumes 'columns' does it.
*_BUT, _* in the second part(ovarian cancer), even though the code is using 'svd' function, the 'obs' matrix is 216x4000(216 patients) where each 'row' represents individual patient. Thus, here, U and V is actually like V and U in the first part of the lecture, respectively.
Also, in the for loop in the code, the code plots each patient(each dot) in the 3 "principal" axes, in the for loop(in Matlab, A' means conjugate tranpose of A).
*_However, _* the code calculates the dot products of the two long vectors(4000 elements, and this can be even larger in different examples).
We _don't_ need this calculation because U already contains the exact same values(this U would have been V if each individual patient were represented by column, not row).
So, we can just use U(i, 1), U(i, 2), U(i, 3) for x, y, z in the for loop, instead of calculating dot products.
(I don't use MATLAB but it should work. If it were Python, the only difference would be the indices start from 0 and using square brackets instead of parentheses).
But, still, knowing why those dot products("projection" onto orthonormal vector, in this case) works is important in understanding SVD and PCA.

Anyway, thanks a lot for this great series of lectures, awesome.

starriet
Автор

Excellent! Your 15-minute video really captures the majority of the 100 years of information on PCA. SVD works!

MageshJohn
Автор

Code is more understandable for me, thanks for your great job. This example has shown how PCA looks like in the gemotry way. Also there's some implicit relationship between the data points' shape and the centralized matrix's transformation capability which is not mentioned in linear algebra course.

nwxxzchen
Автор

I just don't understand why for the ovarian cancer example you don't do the preprocessing steps (mean and division by sqrt(Nmeas))

sapertuz
Автор

Great Video, But one confusion, Arent we supposed to subtract the mean before computing the SVD? in the ovarian cancer case

haideralishuvo
Автор

Excellent lecture. Question: once you have determined the magnitude of the principle components is there a way of determining which features they represent in your original data? For instance determining which features from the cancer data correlated strongest to a cancer diagnosis?

fermijman
Автор

High quality presentation, Thanks for sharing.

abolfazlabbasi
Автор

thank you so much! my understanding increased exponentially when you explained with the ovarian cancer example.

ratnaa
Автор

BTW to use a legend for the ovarian-data you can make use of plot handlers as follows:

h = zeros(2, 1);
...
if(grp{i}=='Cancer')
h(1) = plot3(...);
else
h(2) = plot3(...);
...
legend(h, 'Cancer', 'Normal')

ElPrestigo
Автор

Just to clarify, when you mention the energy of the statistical data, you're referring to the extent to which it captures the trend in the data, right?

zhengyangkrisweng
Автор

Hello Steve ... I would like first to Thank you by your effort in sharing and teaching this amazing technics. I also would like to ask you if it is possible you make a video on how to find the best r value using the Gavish-Donoho method using python language. This would be very useful for me. Thanks a lot and keep going.

Danielsantos
Автор

great explanation! thanks! may i know how do i tell which genes has the highest "impact" with regards to PC 1 ? (in the Ovarian Cancer example) - Is there a way i can tell from matrix U or matrix V ? i just learnt PCA 3 days ago, sorry if this is a noob question :)

yourswimpal
Автор

Off-topic, but how do you get the IDE to be dark for your presentations?

Assault
Автор

what're the differences between the 2 Dimensional and 3 Dimensional data set plots?

ifan
Автор

I got a little bit confused, what's the intuition behind calculating x, y, and z by doing V times b (observations)? What is x, y, and z showing? Sorry for the silly question, thanks in advance.

AdityaDiwakarVex
Автор

Is singular data decomposition also use in 3-dimensional data plots?

ifan
Автор

Been trying to to write a formula to combine both Honey Mustard [ detaSet ] and Ranch BBQ Sauce [ dataSet(2×2) ] as one component while randomly scaling calories and sugar. Don't see what The Matrix movie has to do with anything though.

mybean
Автор

Also, is there anyway to get this code for practice? .. Thank you in advance!!

mataFot
Автор

dear Steve, I see that in my data set 2 states contribute to 90% of the data, how do I know, which ones?

alex.ander.bmblbn
Автор

Is there a convention about signs? I was convincing myself.. and what made me confused is T1, T2, T3 (scores) matrices in code below have same values with different signs. I found some article and code about flipping sign of svd and pca but I couldnt be sure... I'd be very happy if you made it clear for me, thanks!
%% CODE
clear; close all; clc;
load fisheriris
X = meas;
% X = 5*randn(300, 10);
[W, D] = eig(X'*X);
W = W(:, end:-1:1);
D = D(end:-1:1, end:-1:1);
T1 = X*W;
[U, S, V] = svd(X, 'econ');
T2 = U*S;
[coeff, score, latent] = pca(X, 'Algorithm', 'svd', 'centered', false);
T3 = score;

burakyesilyurt