Feature Selection - 1 | Problems in dataset | Less is More | S2 E1 | Over-fitting | Machine Learning

preview_player
Показать описание
Irrelevant features add less information and more noise. Noisy data causes many problems in training a machine learning model. Feature selection helps us to get rid of these irrelevant features in just two steps: choosing features to remove based on some tests (tests will be covered in this series itself) and deleting them. However, it is also important to understand that in which particular ways does these irrelevant features affect our predictive power. So, in this video we will discuss the problems that exist due to the presence of noisy features/columns.
The problems are:
1. Collinearity - When one column can be derived from another column using just a linear equation, we say that the two columns are collinear. The collinear columns add no information but only noise. So we delete one of the two.
2. Less Variance - Some columns may have all very similar values. These columns give no additional information which could help the model to perform better.
3. Over-fitting - When columns don't give additional information and only provides noise, fitting the model on these columns cause over-fitting.

Over-fitting is a vast topic and I will dedicate another series on that topic so that you people understand the whole concept, but whatever is required to for this video, is already discussed in the video.

Please enable the notification. WHY?

It is very common in machine learning students to leave some doubts for later whenever they encounter an advanced or difficult topic. Even though there is nothing bad in that because machine learning concepts are difficult at times, but since those concepts are important, they come to haunt you back. So, if you have notifications for my channel, on, I am pretty sure that some of my videos that I will post will be on the topics that you have missed or are still not clear. The best part of this exercise will be that on some random day you will have an important concept clear without even explicitly going to YouTube to clear those concepts out.

Please Comment, WHY?

Comments help both of us connect inside YouTube. If you post a doubt, I will try to answer that doubt as a reply. If the doubt is good, I will make a video. Yes, I definitely plan to make videos explicitly on complex doubts that are difficult to understand without pictorial demonstrations and a step by step approach. I think this will help the community in the best possible way.
Your comments will also help me know my mistakes and how can we grow as machine learning enthusiasts.
Рекомендации по теме