Machine Learning with R | Machine Learning with caret

preview_player
Показать описание
Learn how the R and Caret package can help to implement some of the most common tasks of the data science project lifecycle. The R programming language is experiencing rapid increases in popularity and wide adoption across industries. This popularity is due, in part, to R’s huge collection of open-source machine-learning algorithms. If you are a data scientist working with R, the caret package (short for Classification And Regression Training) is a must-have tool in your tool belt. The caret package provides capabilities that are ubiquitous in all stages of the data science project lifecycle. Most important of all, Caret provides a common interface for training, tuning, and evaluating more than 200 machine learning algorithms. Not surprisingly, caret is a surefire way to accelerate your velocity as a data scientist!

In this presentation, we will provide an introduction to the caret package. The focus of the presentation will be using caret to implement some of the most common tasks of the data science project lifecycle and to illustrate incorporating caret into your daily work.

Attendees will learn how to:

• Create stratified random samples of data useful for training machine learning models.
• Train machine learning models using caret’s common interface.
• Leverage caret’s powerful features for cross-validation and hyperparameter tuning.
• Scale caret via the use of multi-core, parallel training.
• Increase their knowledge of caret’s many features.

R code and accompanying dataset:

caret website:

Table of Contents:
0:00 – Intro
3:24 – Motivation
5:07 – Expectation setting
9:23 – The data
11:57 – Caret
1:18:46 – Resources

--

--

--

Unleash your data science potential for FREE! Dive into our tutorials, events & courses today!

--

📱 Social media links

--

Also, join our communities:

_

#machinelearning #rprogramming #caret
Рекомендации по теме
Комментарии
Автор

This is the single best ML video on the internet. Dave for President 2020.

brophy
Автор

This was really great Dave. I've done a bunch of your tutorials online including the intro to data science videos you did using the Titanic Kaggle competition about 4 years ago. What I enjoyed the most about this video was seeing how much more confident and impassioned you have become as a data scientist since those prior videos. You can tell that it really excites you and that is infectious in a teaching environment. I too have become somewhat hooked on data science and I was one of those students that avoided statistics at all costs at every level of education. I'm looking into coming to one of the data science bootcamps at the data science dojo and really looking forward to learning from people that are equally passionate about data science and hopefully making up some lost ground. Keep up the great work.

ghexer
Автор

Oh. My. God. THIS... This literally changes everything.

seanpitcher
Автор

this was excellent I've leant quite a lot and have a few new books for the reading list. Many thanks!

pipertripp
Автор

Simply excellent, I could not hold my self to comment even if few miniues are still left . You are genious to make things so interesting .

bljangir
Автор

Thanks a lot! Doing my first steps into R and Machine Learning. This talk is exactly what I needed

yanivtubul
Автор

very nice, i just used this package for an assignment. this got me enthusiastic to learn more

shaunoconnell
Автор

Thank you very much Dave & team. Really enjoy the whole presentation and learn a lot!

tamafun
Автор

By far the best video out there for ML in R

CK-vyqv
Автор

Excellent presentation, you are a great teacher. Thank you

acada
Автор

Brilliant and great advert for your bootcamps!

antzlck
Автор

Great guide, I was really struggling with a ML assignment and didn't realise what an absolute unit 'caret' is!

reubenschneider
Автор

Great video...Do you feel it is necessary to use dummyvars before doing the imputation ? Isn't it sufficient to do the imputation within the call to the train function as part of the preProcess argument ? That is, is the conversion to one hot encoding outside of the call to train, strictly necessary ?

atlantaguitar
Автор

Great video! Only one question. When you say that set.seed(54321) is not random, what do you mean? I thought whatever we put in set.seed could be anything, e.g., set.seed (321). What is the meaning behind your 54321? You sorta glanced over that part and I'd love to dive a little deeper into that.

StockSpotlightPodcast
Автор

@dave, i understood how you imputed the age. however if we have like 200 missing data for embark data, will the same method for imputing age work, ? i mean is not it possible that for some cases both Q and S might have values close to 1 for same row? what to do in that case

arindambpcsrkm
Автор

Hi Dave, very instructive video, congratulations. Please let me ask you a question:
I know caret does not impute with factors. But how do you do in practice when you need to impute data to categorical/factor variables? (discarding the mode)
In the example of your video, in the dataset "imputed.data" you have two columns/dummies for Sex. If you -hypothetically-impute missing values for them, how do to take them back to the original dataset, in which there is only one column for Sex?

sebastianvarela
Автор

Thanks for the video! Quick question - why do you have to split the data into a training/test set of 70/30 when you are going to do 10-fold cross-validation (90/10 split?) anyway later on? Are these two different things?

erinklark
Автор

I noticed that the other columns with large number of na's were removed and while imputing Age variable all the other factors were having no na's . What should I do if the variables that are critical for imputation of age variable also has na's ? I'm a noob. So please correct me if there is lack of logic in my doubt.

alisterdcruz
Автор

Thank you for sharing ! Amazing Video and Instructions.

julianonas
Автор

I do appreciate Dave's approach. I think it's important to stress that there is a lot more to being a data scientist than simply understanding concepts of M, AI, etc, or taking a few online courses a certificate. I believe it takes graduate coursework and years of being a practitioner underatnding and implementing a list of techniques. Engineers typically vector into data analytics completely differently than I do, having a MS in data analytics. It is a good illustration into just how complex and broad the science of data is in these infant stages.

KarriemPerry