198 - Feature selection using Boruta in python

preview_player
Показать описание
Code generated in the video can be downloaded from here:

pip install Boruta

XGBoost documentation:

Dataset:
Рекомендации по теме
Комментарии
Автор

I can't get enough of these videos. And he knows that.

greendsnow
Автор

I would also be interested in more traditional machine learning. Most work done by data scientists I’ve seen is just preprocessing and postprocessing anyway

channelforstream
Автор

Dear sir,
your episodes are great!
I like to learn about new tools and libraries.
Keep teaching us!
Thanks

evyatarcoco
Автор

Really well explained, thanks from Australia

michaelmecham
Автор

Hello Sir, I am following your tutorial but facing an Error, "ValueError: Please check your X and y variable. The providedestimator cannot be fitted to your data. Invalid Parameter format for seed expect int but
Any help regarding the issue will be highly appreciated.

bikashchandragupta
Автор

Awesome video man. It really helped me.

RadhakrishnanBL
Автор

Thank you for this video! Great stuff!

fassesweden
Автор

Thanks a lot for sharing your knowledge with us! Do you consider making a tutorial with Brats or LITS challenges? We would love it:)

xcalmaf
Автор

can this algorithm be applied for feature selection of mixed data type i.e. data has both boolean and continuous variables? Please let me know

anjalisetiya
Автор

Thanks for the video. May I know how Boruta is different from Random Forest's feature importance? Are both same?

manonathan
Автор

why boruta algorithm does not work with ababoost

sallahamine
Автор

Any help to solve this error "XGBoostError: Invalid Parameter format for seed expect int but value='RandomState(MT19937)'"

RadhakrishnanBL
Автор

Hello sir, would you cover a feature selection technique which uses hierarchical or k-means clustering if possible? I saw scikit-learn seems to have this function(sklearn.cluster.FeatureAgglomeration), but few people talks about that. Thanks in advance.

leamon
Автор

I tried testing with all the feature and with boruta selected feature, the accuracy doesn't changes, so the idea is to use less feature keeping the metric same ?

MrTapan
Автор

Hi Sreeni. Thanks for the excellent videos. In many cases once the BorutaPy finished running, the tentative numbers printed out is different (less) than the actual runs. For example, in one of my use case with 196 features, the (100) iteration ended with 46 tentative features while the summary printed out only 28. Why is this different? How this is treated in Boruta?

kannansingaravelu
Автор

I'm curious to know if you could point out what the issue is. I have a dataset where my number of labels (y) is 55, and the number of independent variables (X) is 100. The dataframe total (if both X and Y combined) would be 55x101.

I used a similar procedure to what you presented, and the only difference in datatype is that my y_train is int64 and my X_train is float64. I ran XGBoost and BorutaPy, but I am receiving an error when fitting the feature selector to X_train and y_train. The error I'm getting is:

"Please check your X and y variable. The providedestimator cannot be fitted to your data. Invalid Parameter format for seed expect int but value='RandomState(MT19937)'"

I can't seem to find an issue opened on either the BorutaPy or the XGBoost forums with the same error I'm getting. I'd appreciate your input!

awa
Автор

There are 7 features with rank 1, how do you further rank the features between them?

aditya_baser
Автор

Hello Teacher nice video. I am doing classification using CNN. Is there any good way for feature selection as I am using hybrid model. The accuracy is low may be because of the redundant features by the two model.

zakirshah
Автор

Professor, congratulations again for the video! I' m very grateful!

I have a doubt.
Could I use the feature selector at the end of a pre-trained CNN? (flatenned layer)
I would like to reduce the dimensionality using a ML method.

carlosleandrosilvadospraze