Intro to Machine Learning: Lesson 3

preview_player
Показать описание
Today we'll see how to read a much larger dataset - one which may not even fit in the RAM on your machine! And we'll also learn how to create a random forest for that dataset. We also discuss the software engineering concept of "profiling", to learn how to speed up our code if it's not fast enough - especially useful for these big datasets.

Next, we do a deeper dive in to validation sets, and discuss what makes a good validation set, and we use that discussion to pick a validation set for this new data.

In the second half of this lesson, we look at "model interpretation" - the critically important skill of using your model to better understand your data. Today's focus for interpretation is the "feature importance plot", which is perhaps the most useful model interpretation technique.
Рекомендации по теме
Комментарии
Автор

32:29 From what I know, since version 0.22, they have added a new hyperparameter called max_samples, which does the job of set_rf_samples but does not conflict with oob. Hope this helps future learners

ihgnmah
Автор

1. When Random Forest is a good choice? 2:42
2. Grocery Competition 10:15
3. Reading big csv to Pandas 15:20
4. Handling time dependency 22:56
5. Prun 31:29
6. Time dependency - mean of target variable in different groups 35:43
7. Difficulties in ML coding 38:38
8. Testing test set 40:30
9. Confidence intervals 54:40
10. Featue importance 1:07:19

piotrmenclewicz
Автор

I tried many techniques but 30gb ram of kaggle is not enough for this dataset... what i should

ilovetensor
Автор

39:45 special class about the top mistakes made? :) would be super helpful to learn about the most common pitfalls.

gcm
Автор

Has anyone tried to run all of this in kaggle? It is not possible as all the time, the ram just gets filled up and i am not able to clone this on kaggle !!!

ilovetensor
Автор

why anyone is not attending this course?
is it too old? or are there new courses better than this?

ilovetensor
Автор

1:09:00 is it possible that for RF there are a group of features which are important, but using some other model (e.g. Neural Nets) other features pop up as the most relevant ones? If not, is it correct to assume I can transfer this knowledge to building other types of predictive models, ignoring the non-relevant features as suggested by the RF?

gcm
Автор

Please never wear that shirt for a recorded lecture again :)

gdoteof
Автор

19:31 The integer types could be changed to unsigned integers to make it take up even less memory...

liptherapy
Автор

51:10 I don't find groceries on github =(

yourxylitol
Автор

Integrating a Time Series Forecast into a Standard Machine Learning Model is pretty difficult. For this problem I believe starting with a base Time Series specific model like FB Prophet would result in a better performance vs starting with Random Forest, etc.

gardnmi
Автор

even jeremy also dont take a look on all these comments

ilovetensor
Автор

I am not able to load the grocery dataset . I have i5 8th gen laptop. is it
Not able to perform this task??

Akshit_Saini_