Lecture 13: Randomized Matrix Multiplication

preview_player
Показать описание
MIT 18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning, Spring 2018
Instructor: Gilbert Strang

This lecture focuses on randomized linear algebra, specifically on randomized matrix multiplication. This process is useful when working with very large matrices. Professor Strang introduces and describes the basic steps of randomized computations.

License: Creative Commons BY-NC-SA
Рекомендации по теме
Комментарии
Автор

Dr. Stranglove or "How I Learned to Stop Worrying and Love Numerical Linear Algebra"

roceb
Автор

This class is concerning of the the product of two huge matrices A and B, assuming in the simple case that A is [a, b], and B is [cT, dT], and the algorithm is that we choose one column from A and one row from B. For instance, we might choose column a from A and row cT(c transpose) from B, and then we do out product of these two vectors to get a*cT, this will serve as a sampling to simulate (or one of our guesses) the real product AB. As we accumulate more and more of the samples, if we sum all of these out-products together, we will gradually approach the real answer AB. His calculation on the [a, 0], [b, 0] business in the fist part is only tryting to capture on average how the algorithm is dealing with one of the components A in the product AB.

keys
Автор

This is another beautiful lecture that summaries Probability, Mean, Variance and Lagrange Multipliers by the grandmaster of linear algebra.

georgesadler
Автор

So I was a bit confused by his (a, b) notation but I think I got it.
So in his 'practice' example we have mean = 1/2 (a, b). (a, b) is really the matrix [a b]. With probabilities 1/2 we pick either the matrix [a 0] or [0 b]. So we average the matrices to get 1/2 [a, b]. Now two samples means we get two matrices randomly with the above probabilities and add them. If X is a random variable, mean(2X) = 2 mean(X), so we multiply the mean by two to get [a b], the original matrix.
Hope this helps someone.

martinspage
Автор

How I understood the beginning example about the mean: We defined our sample space to be Omega = { [a, 0], [0, b] }. We defined the distribution over this sample space to be uniform (hence the 1/2). Now each sample can be seen as a random variable which takes on one of the values in Omega. By definition, the expectation is the sum of the products of each outcome with its probability. However, if we define a random variable to be the sum of the samples then we get the expectation of the new random variable is sum of the expectations of each sample. Each sample has the same expectation thus we can reduce the sum to a multiplication by s.

PS I don't have very advanced knowledge of probability theory (I'm taking a course right now). If I have made a mistake let me know.

naterojas
Автор

I am confused about the calculations of variance in the practice example. If the variance is the distance from the mean, it seems to be 1/2(a^2+b^2) instead of 1/2(a^2, b^2).

waynehu
Автор

I think Prof. Strand made a little mistake in calculating the variance at 35:00, both terms should be further divided by an "S"(the sample size), and the final variance should have 1/S in front. Besides, the variance of N IID variables should be 1/N of the indiviual IID variable.

keys
Автор

The person handling the cameras should be tracking the content pointed by the lecturer and not tracking the lecturer.

tusharganguli
Автор

I don't understand why people seem so hasty to criticize even when they should listen with a bit more attention. This lecture made perfect sense to me, btw. Lots of love and appreciation for prof. Strang from Pakistan. ♥

thatsfantastic
Автор

the great strang! tremendous lecturer as always!!

ayite
Автор

Is there a way for us to check the notes he is referring at 51:05 ?


Here is what i understood from this lecture:


1.We want to take a sample of a matrix multiplication. Recalling that a matrix multiplication is a sum of columns by rows, we choose the index randomly and get the corresponding rank 1 matrix.


2. We also want that this method converges to AB if we take the mean of different samples, which means that the expected value of this rank 1 sampling is AB. This will force us to rescale the sample by 1/pj.


3.Knowing this, we conclude that taking the rank1 sample of a matrix means taking (1/pj)*(aj*bj). Sampling isnt just about choosing a random index j, but rescaling the rank 1 matrix aj*bj.


4. We dont want pj to be arbitrary, we want it to be optimum in some way, the criteria he used was to choose the pj that minimizes the variance of the norm of the rank 1 sample. This variance is a number and not a matrix since its the variance of a norm.


5. Here comes the procedure he showed which is finding the formula of that variance and then apply langrange multiplier method.


However, i cant explain the presence of the "s" variable and the magical appereance of the Frobenius norm at 35:01


Although i can´t explain them, they should be there because otherwise, with that specific optimum pj, variance of norm turns to be zero, and i have the feeling that a zero variance is not a good conclusion.


If someone has a good answer for this please let me know.

Gisariasecas
Автор

The random variable of interest concerning the approx of matrix product AB is the (Frobenius) norm of AB - X, where X is the average of a bunch of randomly drawn and "appropriately scaled" rank-1 matrices a_i * b_i, a_i and b_i being the i-th col of A and i-th row of B respectively. (Recall that AB = the sum of all a_i b_i for i running from 1 to (#cols of A).) The norm is used here to measure how "accurate" the random approximation X is.

The following note provides a very similar line of reasoning to what we see in this lecture on apporximating AB, see page 3 for the choice of random approximation X and page 5 for the derivation of the expected value and variance of the approximation error (which is scalar valued because we use the norm):

conspicuousLamb
Автор

There is an awesome lemma that lets you preserve distances in a high dimensional matrix by projecting it into a lower dimension with a random matrix. Its called Jordan-Lindenstrauss Lemma

imranq
Автор

OK. This stuff is highly confusing, to say the least. As much as one could rather easily understand what a mean of a random matrix is, it's not that simple with variance. Variance is defined for a random variable or a vector. For a simple r.v. that has its values in R^1, it's just the expectation of (X-E(X))^2, that is E[(X - EX)^2]. For a vector that has its values in R^n it's a COVARIANCE MATRIX. Now, what would be the variance of a matrix? I've never heard of such a construct and I've studied maths (specialization: probability and stats) for 5 years at a technical uni. Maybe (and it's a guess) he wants to do this: Take the random element (which is a random matrix with a probability measure picking up a certain matrix out of a set with a probability of p_i, i=1...n, where n is the number of columns in A and rows in B, since we want to sample from all the columns of A, not only some r cols) and subtract from it the "mean element" (the mean of the matrices as calculated), then take the norm of this subtraction (which is a norm of a matrix), square it and multiply by the probability of the element selected. Then sum up over all the elements (sum over the matrices/elements or columns of A, which is the same as there's 1-to-1 correspondence between the cols of A and the set of matrices we're selecting from). So, this would give us the mean square distance of a random element from the "mean element." The distance would be the 2-norm or the Frobenius norm of a matrix/element. But this would be the EXPECTATION of the random function X : M -> R+ (a random map from the set of matrices/elements to R+ where X(*) = || * - EM ||^2, this being the square of the 2-norm of the element/matrix and EM = "the mean matrix", that is AB). So we are calculating the mean of the X mapping but this "variance " (as he calls it) is not some kind of "matrix variance" (as such things do not exist) but the expectation of the mapping as above. I can't explain what he's trying to do in a better way. He wants to minimize the expected squared distance from the randomly selected matrix to the "mean matrix." and hence he looks for a probability distribution on the columns of A that achieves that. At the same time, I don't understand the notation he's trying to use :( Highly confusing, prof. Strang.

dariuszspiewak
Автор

I think cancelling S at 38:05 is a mistake. Variance must decrease by increasing the number of samples!

alisahebpasand
Автор

i think the s should be squared in the variance 1/s^2(c^2 - ||AB||^2) ?

jrf
Автор

It’s quite confusing to me that the variance sometimes is a vector/matrix, but sometime is a number. So is it a number or vector?

Also I don’t get the multiply by 2 for two samples. Initial definition is avg of 2 samples, not sum of two samples.

flyflyflyfly
Автор

Isn't this just talking about bias variance tradeoff where we are minimizing frobenius norm + bias is zero since estimator is unbiased? Also the reason why variance is number is that we are minimizing variance of frobenius norm.
What I understood is this

ffoqjgc
Автор

yeah, it's where i got confused too

tnybnee
Автор

오늘부로 스트랭 교수 지지를 철회한다

오늘부터 지지관계에서 벗어나
스트랭 교수님과 나는 한몸으로 일체가 된다
스트랭 교수님에 대한 공격은 나에 대한 공격으로 간주한다

세상에 70억 명의 스트랭 팬이 있다면, 나는 그들 중 한 명일 것이다.
세상에 1억 명의 스트랭 팬이 있다면, 나 또한 그들 중 한 명일 것이다.
세상에 천만 명의 스트랭 팬이 있다면, 나는 여전히 그들 중 한 명일 것이다.
세상에 백 명의 스트랭 팬이 있다면, 나는 아직도 그들 중 한 명일 것이다.
세상에 한 명의 스트랭 팬이 있다면, 그 사람은 아마도 나일 것이다.
세상에 단 한 명의 스트랭 팬도 없다면, 나는 그제서야 이 세상에 없는 것이다.

스트랭, 나의 사랑.
스트랭, 나의 빛.
스트랭, 나의 어둠.
스트랭, 나의 삶.
스트랭, 나의 기쁨.
스트랭, 나의 슬픔.
스트랭, 나의 안식.
스트랭, 나의 영혼.
스트랭, 나.

uqbsbsx