Linear Least Squares to Solve Nonlinear Problems

preview_player
Показать описание
Ever wondered how Excel comes up with those neat trendlines? Here's the the theory so you can model your data however you like! #SoME1
Рекомендации по теме
Комментарии
Автор

"When I see a variable in an exponent, I try to use logarithm as a ladder so that I can bring them off their shelf" is a poetically quote-worthy sentence.

oceannuclear
Автор

Mimizing the sum of squares is not equivalent to minimizing the sum of absolute deviations. This is easiest to see if you try to just fit a single constant c to the data, i.e. minimize sum(|x-c|) vs sum((x-c)^2). In the former case, you get the median, whereas in the latter case, you get the mean. Generalized to curve-fitting, minimizing the sum of absolute deviations is called "least absolute deviation" fitting, which is different from "least squares". (Statistically, "least absolute deviation" can be interpreted as assuming that the errors are Laplace-distributed, while "least squares" can be interpreted as assuming that the errors are normally distributed.)

tailcalled
Автор

The method of
raw data -> manipulation to make linear -> least squares fit -> post-analysis to recover actual fit parameters

is something that I've used several times, and it's a life saver every time. However, it's important to note that you're no longer just minimizing the squared deviation, and the amount that the error matters can become unequally weighted.

As an example, an exponential function can be manipulated to be linear through a logarithm, such that y=Ae^Bx becomes ln(y) = ln(A) + Bx. Fitting to this line to the data using least squares will minimize the squared deviation between ln(y) and ln(data). The result is that larger data points are relatively less important to the fit than smaller data points. Say you have the points (1, 2) and (10, 200) in your data set and your least squares fit gives you the points (1, 1) and (10, 100) on the best fit line. The x=1 point has a real squared deviation of 1, and the x=10 point has a squared deviation of 1, 000. However, the deviation used in the manipulated least squares fit is on ln(y), which gives the x=1 point a squared deviation of ln(2)^2=0.48, and the x=10 point a squared deviation of ln(200/100)^2=0.48... The same weighting. In this case, the error is weighted using fractional error, and since these are both 2x the fit line, they have the same error as far as the least squares fitting is concerned.

This weighted error fitting can be desirable or not, depending on your use case. Just something I've noticed through use, and thought it might me useful to someone else. :)

Simonsays
Автор

Aside from the flaw others already have pointed out it’s a really well made video and broadened my horizon for the Least Squares method which so far I had only applied to lines 👍🏼

timdernedde
Автор

The correct motivation for least squares is the Gaussian error model. The probability of error e goes like exp(-C e^2), and so the total probability density for all the errors is the product of these exponentials, or exp( - weighted sum of squares ). Minimizing the square deviation is the same as maximizing the probability of getting the data given your model. This is the Bayesian rule for finding the most likely possibility for the parameter values.

annaclarafenyo
Автор

Great video. Essentially just expanding early statistics formulas and changing meaning of operation to fit new context. Example Error is effectively variance. There is always beauty in expanding the usability of the tools we already have. There is a great spirograph video someone came out with recently that absolutely blew my mind.

chuckhammond
Автор

Very nice video. I`m personally teaching least square regression to my colleagues with a similar approach. I agree with the comments, but don`t take it personally. There is too much stuff on this subject for a 12 min video, so I understand the need for some simplifications. That will give you a reason to do a part 2, where you`ll be able to refine and go further. One thing I would like you to consider is to warn your audience about the danger of using X transpose X form of the normal equation. With a lot of data points, numerical instabilities can occur. One thing also that can interest your audience is to show them some examples with a specific tool like They have specialized functions for least square regression which are not as well known as you might think. Another topic that you can add to your list is the uncertainty estimation of the estimated parameters. This is often neglected and it is very important to have an idea how well you can know the parameters. I`m encouraging you to continue. You have a gold mine in your hand. Good luck!

stephanel
Автор

whatever you did here, its beautiful. very good explanation, and I hope to see more !

alihouadef
Автор

I really enjoyed this video, and it inspired me to write a little python program to implement it. Thank you for sharing.

TheLuke
Автор

Just to echo the comment already made, minimizing squared deviations is NOT THE SAME as minimizing absolute deviations. Minimizing squared deviations provides an estimator for the mean of y given x. Minimizing absolute deviations provides as estimator for the median of y given x. Other properties also differ across approaches, for example mean squared error is much more sensitive to outliers.

matthewb
Автор

Thank you, this video helped me a lot. Also, really nice editing. Hoping to see more videos!

unraton
Автор

thank you so much for explaining this. Cheers!

Numerically_Stable
Автор

It is a very useful video, thank you!

norbertbarna
Автор

At 3:47, the equations after setting derivatives to 0 being linear in coefficients in m and b are a direct result of the expected function (here f(x) = mx + b) being linear in m and b. If the expected function was non linear in m or b, those equations would have been non linear too such as for f(x) = m^2 *X + mx +b or f(x) = e^(mx) + b

MayurGarg
Автор

Where was your video 3months ago because i was needed it soooo much then
I was working on fitting covid data with SEIR model
I hope you do more videos on that topic

Xphy
Автор

Everyone getting nitpicky about "least absolute deviation" versus "least squares deviation" are missing the point, I think. Sure, he might have said that there's no difference, but the conceptual (and important) part as it relates to this video is that minimizing either one will minimize the error in some sense. For the general audience this video is intended for, this is plenty of confirmation to accept that using least squares deviation is valid.

Simonsays
Автор

Did the video do an abrupt cut at ~ 10:32?

ivolol
Автор

So, what are you doing with the varactors?

fletcherreder
Автор

I KNEW I HAD HEARD THIS VOICE ALREADY!! Are you quantum boy? :))

ohanabergerdesouza
Автор

so it's possible to use this in parameter estimation of consants in ode models? but instead numerical integration is used?instead of a functional relation

jamespeter