Feature Engineering-How to Transform Data to Better Fit The Gaussian Distribution-Data Science

preview_player
Показать описание
Some machine learning models like linear and logistic regression assume that the variables are normally distributed. Others benefit from "Gaussian-like" distributions, as in such distributions the observations of X available to predict Y vary across a greater range of values. Thus, Gaussian distributed variables may boost the machine learning algorithm performance.

Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more

Please do subscribe my other channel too

If you want to Give donation to support my channel, below is the Gpay id

Connect with me here:

Рекомендации по теме
Комментарии
Автор

It is great having you as a digital mentor in this QUARANTINE
Great job 👍

shantanudolui
Автор

Thank you so much Krish for such an informative lecture. God bless you dear.

wajdilas
Автор

In exponential distribution case why you didn't use exp_fare variable in diagnostic.plot (you took sqr_fare for the same.)

mahadevkandalkar
Автор

Really amazing video
Thank you so much sir
Explained beautifully

thedataguyfromB
Автор

Heartly thanking you so much Sir for your efforts.

anandhuded
Автор

Sir...after transforming the data into Gaussian distribution also should we need to perform feature scaling like min-max scalar or standard scalar...??
or can we perform any one among (Gaussian Transformation (or) Feature Scaling) on a particular feature..??

phanindratangirala
Автор

The features(regressors) do not have to normally distributed. If there is heterosceadasity in the residuals, it's like that the model is underspecified and transformation of variables is one of the techniques used to eliminate it.

swat_katz_tbone
Автор

Thanks for nice video. I have one suggestion pls use feature-engine library for missing value imputation. Simply don't use that much manual code.

shreyasb.s
Автор

The assumption of linearity, implies that, regression must be linear in the coefficients and not the independent variables. Eg, we can have regression equation like, y = m1X1 + m2 X2 ^ 2 + c. This means y is linearly related to m1, m2. but not with X1, X2 etc. We cannot have equations like
y = m1^2X1 + m2^2X2.... So point No.1 mentioned in the video is incorrect.

manishgaurav
Автор

should i always transform non-normal independent variables? Transforming the variables seems to be changing the interpretation of the variables itself. In that how do I handle the outliers in independent variables? Can I go for outliers trimming and capping then? Can I leave the independent variables sknewness as it is.?kindly advice.

shobithas
Автор

Hi Krish,
How to reverse transform the transformed variable after doing prediction to come out with the actual predicted number. I am having problems in a project. Please help

chayanmehrotra
Автор

Great Job Sir.Such a well Explained Video. But I am not able to get that Imputation technique.If Possible plz explain that part.

saswatpriyabrat
Автор

Hi sir, thanks for great videos, please upload nlp playlist please, first like sir

Trouble.drouble
Автор

It's really the best explanation you have provided. I appreciate it. By the way, there is a mistake at time stamp 20:39. You have plotted square root fare again instead of Exponential fare.

ganeshnvsnm
Автор

Does a Gaussian distribution affect the accuracy of OLS Linear regression or is it applicable only for gradient descent linear regression?

kirangeorge
Автор

Please make a separate video on Box Cox

raneshmitra
Автор

Great job mate, quick q, As far as I know, linear regression does not really assume the need for feature normality. Can you point to source from any literature ?

justfun
Автор

Can we do a transformation on a feature more than once? Like first we do a exponential transformation and then do a logarithmic transformation, or something on these lines of thinking?

anamitrasingha
Автор

Hi krish. I think that logistic regression does not assume the variables to be normally distributed? Can you throw a light on this?

lakshitakamboj
Автор

Sir, shouldn't the outlier treatment be done before this step?

vigneshg