Programming R Squared - Practical Machine Learning Tutorial with Python p.11

preview_player
Показать описание
Now that we know what we're looking for, let's actually program the coefficient of determination in Python.

Рекомендации по теме
Комментарии
Автор

*This is the perfect pace. Don't hear those who say that you're slow. It think you're doing perfect. Lots of Love*

trends
Автор

def r_sq(regression_line, ys):
se_y_hat = sum((ys - regression_line)**2)
ys_mean = mean(ys)
se_y_bar = sum((ys - ys_mean)**2)
return 1 - (se_y_hat / se_y_bar)

mavriksc
Автор

For anybody else who was initially confused, in squared_error he can use:

sum((ys_line - ys_orig) ** 2)

because ys_orig is a numpy array. If you attempt to do this with 2 list objects, python will throw an error. Numpy arrays perform vector subtraction ----> [4, 5, 6] - [1, 2, 3] = [3, 3, 3].

CarsonJamesCook
Автор

You could have written your 2 functions in 4 lines of code:

def determination_coefficient():
sst = - ys, 2))
sse = np.sum(np.power(ys- np.mean(ys ), 2))
return 1 -sst/sse

bnjmn
Автор

I love you sendtex! I'm 13 years old and I've pretty much learned everything I can about python from you.

oliveredholm
Автор

On the coefficient function you can just use:
"y_mean_line = mean(ys_orig)"
the for loop is not needed

nicobohlinger
Автор

When Edward Snowden be teaching Python analysis in Russia....

PandemicGameplay
Автор

In statistics SE stands for standard error and has a total different meaning than SSR (Sum of Squared Residual). Although the Python programming is great, I wouldn´t recommend learning about linear regression here.

dumpert
Автор

Thank you for these videos. For some reason it's so much easier for me to understand the math of the statistics via the code than via the mathematical equations.

kilfoofan
Автор

These are really good explanations for a subject that is one of the more difficult topics to understand in computer programming or math.

timharris
Автор

I can't wait to see a tutorial on AI by you. You're the best sentdex!

mohammednagdy
Автор

This explains really well my original instinct of saying that the R^2 value is not that relevant unless you actually look at residuals too. And the value itself is not that informative, in social sciences anything above 20% is considered good, whereas when predicting credit risk, an equivalent 80% is considered a low value.

levyroth
Автор

Hi, Sentdex. I just watched Ron Bekkerman's Linkedin presentation on Machine Learning. Were you there on the presentation? Because somehow I heard your voice. :)

pakdhenu
Автор

Hi Sentdex, first of all thank you for all the tutorials you've made. They're just wonderful. However, I think there is a misunderstood here. I think the bigger the R_square is, the more accurate the regression is. Let's say, if the predicted_output and the real_output is not much different, so the square_error of y_regression is nearly zero. You divide it to the square_error of y_mean you would get something close to zero.

Finally, R_square = 1 - (something_close_to_zero) =

Just saying.

hoanganhkhoil
Автор

Love your videos sentdex ! Just wondering if you ever get to reducing the dimensional space. When do you start taking that into account ? That would be useful for the Stock market example, as some features might not be relevant.

faisalel-shabani
Автор

i meet the error: sum((ys_line - ys_orig) ** 2)
unsupported operand type(s) for -: 'generator' and 'float'

huanyumao
Автор

+sentdex
In the first statement of the function coefficient_of_determination, I think it should be just

y_mean_line = mean(ys_orig) instead of y_mean_line = [mean(ys_orig) for y in ys_orig]


Please correct me if I am wrong

ashishkarn
Автор

Though I am 2 yrs late but I have some queries. Here your are calculating the coefficient of determination. Is it different from root mean square error? and Is it important to calculate both or just One method will do?

suhirdsingh
Автор

Shouldn't it like mean(y) for y in ys_orig????

priteshborad
Автор

Hello sentdex, I have a small doubt. What if the X = [1, 2, 3, 4, 5] and Y = [5, 5, 5, 5, 5]. In this case the best fit line would be y = 5, and mean_y would also be 5. So the R squared would turn out to be zero. But the best fit line is a very good predictor which is opposite to what you say that (more the R_squared value better it is). Please explain. Thanks in advance

aurobindomondal