Tutorial 5- Feature Selection-Perform Feature Selection Using Chi2 Statistical Analysis

preview_player
Показать описание
All Playlist In My channel

Please donate if you want to support the channel through GPay UPID,

Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more

Please do subscribe my other channel too

Connect with me here:
Рекомендации по теме
Комментарии
Автор

Revise Feature Selection From the below playlist

krishnaik
Автор

Hello everyone, here's something that has been done wrong in this video, I am pretty much sure Krish might have done that by mistake. In the last part where you do the sorting thing, in between 18:38 - 21:00, that step is not sorting the columns based on their associated p-values instead it is sorting them based on their names, i.e 'Sex' is coming before 'Embarked' even though the p-value for 'Embarked' is greater than 'Sex', and that is why you are receiving randomly sorted results when you see the decimal converted p-values. To fix this or sort that Series object on the basis of their respective p-values use instead of p_values.sort_index(ascending=False), now you are generating results on the basis of sorted p-values, which should be correct imo. Hope that helps!

yashasvibhatt
Автор

Who knew after the traumatic event of the titanic, it would become a famous practice problem to solve among ML industry

anshumaanphukan
Автор

Thank you for providing us such great content! I am glad that I found your YouTube channel!

You got another subscriber!

pedrocolangelo
Автор

loved yours passionate discussion on why certain people survived lol.

waichingleung
Автор

sir in the end if we check p value then here you did thing sort_values should be performed right?

sivachaitanya
Автор

Hi krish, we are doing train test split then feature selection. Suppose out of 10 features 7 are important then we have remove 3 from x train and x test, why can’t we do feature selection first then do train test split?

ajaykushwaha
Автор

Why do we use train test split while performing Chi square, does imbalanced data on a Boolean output has any impact ?

kumarabhishek
Автор

great video brother, but how about this : instead of encoding columns that already has binary variables (like sex, alone, survived), supposing we have some categorical features (after encoding them) that carries more than one column for each, it happens when a certain column has more than 2 variables. in this case and by considering nominal variables, instead of having one Pvalue for each feature we might have 3 Pvalues or more for a single feature.
question : what should i do in this situation ?
i wish you best luck brother

abdelouadoudkhouri
Автор

Sir will you be uploading more feature selection techniques?

pushpitkumar
Автор

Hi sir,
what if we have around more than 2k columns and for all columns how we can perform encoding?

amardeepkumar
Автор

Sir as we are label encoding, aren't we iindroducing order in those features? is it ok? we won't use it while building models right?

sane
Автор

the p-value of alone column is 0.9 that is greater than significance level of 0.05

wahidnabi
Автор

there is minor mistake in code. correct code is :

ajaykushwaha-jemw
Автор

How lesser p value in turn is more important feature, isn't that value lesser than .05 % will be lying somewhere at the extremes of bell curve so we neglect that hypothesis.

nik
Автор

how to drop the columns based on those f and p values if we have more columns....?

maskman
Автор

Wonderful explanation. What happens if your dependent variable (Here Survived is having only two values: 0 or 1) was also categorical with more than 2 values? How do you identify the features to drop? How do you perform the analysis of the odds of the independent variables associated with that dependent variable? Logistic Regression with multi-class?

Do you have a use case or example of a scenario where all your dependent and independent variables are categorical? what type of test can be done to determine the odds of the output variables on the given input features? Specifically target variable is having more than 2 values?

sarvatra
Автор

Do we have any function in panda’s which pic all categorical features from data set and show us?

ajaykushwaha
Автор

You sort P_values in descending order at 20.12 n you say lesser p value's feature is important.So according to this alone must be the important feature?

utkarshsharma
Автор

how we can do features selection fo clustering tasks???

mohamedmouhiha