What Makes a Good Feature? - Machine Learning Recipes #3

preview_player
Показать описание
Good features are informative, independent, and simple. Learn about these concepts using a histogram to visualize a feature from a toy dataset in Python in this episode of Machine Learning Recipes. John Gordon shares best practices for what a good feature is and best practices to follow.

Check out the playlist to watch the next episode in the series as we reinforce concepts, introduce clearer syntax, spend more time on testing, and continue building intuition for supervised learning.

Resources:

Chapters:
0:00 - Intro
0:20 - Example of a good feature in binary classification
2:48 - Types of features to use
3:12 - Example of useless feature in binary classification
5:02 - In the next episode…

P.S: We realize some folks had dependency bugs with Graphviz (whoops!). Moving forward, this series won't use any libraries not already installed by Anaconda or Tensorflow.

Last: my code in this cast is similar to these great examples. You can use them to produce a more polished chart, if you like:

Рекомендации по теме
Комментарии
Автор

holy god, Ive been looking whole Youtube and torrent to find a best package for learning machine learning and I watched many videos, but believe me I am totally in love with your course, this is awesome, you explain it so simple it is like you teach me in my mother tongue. i wish you best like my man.

hejarshahabi
Автор

I have limited knowledge of programming and computer science, yet I find this series very approachable and fun.

sephirothjc
Автор

I love your series and wish you would put out videos more often.

EmettSpeer
Автор

Josh, Thank you so much to you and your team for building this series! In particular, I really like your 'tl;dr' approach and keeping things grounded in accessible, real-world examples -- I can't wait to see what comes next!

philipsalvo
Автор

For feature selection: is having no individual prediction power (in your example, eyes) enough to tell whether those features have no value at all? Could they not have some non-linear joint predictive power with some other features?

arkrou
Автор

Creepy smiles at the end of each sentence, "Smile MORE Josh" Marketing bellows!

jamiequigley
Автор

Everyone complaining and wanting more episodes. Here I am in 2020 enjoying all 10 of them so far.

ThereIsNoSpoon
Автор

This is a really great series, please publish more often/sooner!

mitchese
Автор

Really enjoying this series. The examples are quality, and you can tell you have put a lot of thought into them. Thanks for making the subject clear and interesting in each episode.

Technologynorth
Автор

*quick summary of the video:*

- let's say that your goal is develop a program that can distinguish between two breeds of dogs

- what features do you want your example data to have?
- you want the features to be the "distinguishing" features between the breeds, i.e. features that are very different between the two dog breeds
- for example, if the two dog breeds tend to have very different heights, you want to use height as a feature in your training data
- if on the other hand, the two dogs have about the same distribution of eye colors, you don't want to use eye color as a feature
- you also don't want to use features that are highly correlated (i.e. that don't bring in new information)
- you want to use simple features, as simple features will require less examples to get a decent classifier

- you wanna be careful about adding too many features, especially if the features are not "distinguishing" features, they may just by chance be distinguishing in your example data, thus your classifier will start basing its predictions based on these faulty features

*key thing to take away from the video:*
Selecting features is extremely important. Select the simple, distinguishing features, that bring in new information (i.e. that aren't highly correlated).

Thanks so much for these videos!

Abdullah-mgzl
Автор

Loved the stacked histogram, nice way to visualize the different means of the distributions!

giantneuralnetwork
Автор

I could honestly watch Josh all day. He presents really well. Keep up the good quality content Josh! :)

gbhall
Автор

Awesome series, prevents myth that machine learning is difficult

sanjayakumarsahoo
Автор

Patiently awaiting the next episode in two weeks!

net
Автор

Very cool series & I appreciate the links to the examples and especially the "article that inspired". Extra links like that really help! Thank you!

codingwithjoyk
Автор

terrific episode! those heuristics for feature selection are invaluable. also, lol @ whoever produces the graphics: the frontmost dog *head tilt* XD

MattSiegel
Автор

Great job on the video! I can tell that you have taken feedback from previous videos and made adjustments. Thanks for the effort! Will be waiting for the next episode :)

alsonyap
Автор

How to get that pretty data visualisation from matplotlib like in 2:05 ?

HubertRozmarynowski
Автор

Very good explanation! Looking forward for new episodes!

EddieImada
Автор

I just want you to know that I loved the article reference in this video. Please refer to more nice articles like this.

diegolima