Model Metrics | Introduction to Text Analytics with R Part 9

preview_player
Показать описание
This talk provides an overview of Model Metrics that includes specific coverage of:
1. The importance of metrics beyond accuracy for building effective models.
2. Coverage of sensitivity and specificity and their importance for building effective binary classification models.
3. The importance of feature engineering for building the most effective models.
4. How to identify if an engineered feature is likely to be effective in Production.
5. Improving our model with an engineered feature.

The data and R code used in this series is available here:

--

--

Unleash your data science potential for FREE! Dive into our tutorials, events & courses today!

--

📱 Social media links

--

Also, join our communities:

_

#modelmetrics #textanalytics
Рекомендации по теме
Комментарии
Автор

Hi Dave, absolutely loved the video series!!!!. I haven't seen ANY other tutorial that goes into so much depth and walks step-by-step through such a problem. Great work!. Please do keep sharing more such videos.

archowdhury
Автор

Probably best NLP in R and overall ML in R series ive seen

djangoworldwide
Автор

I started the series today and I'm addicted to it... Congratulations on your work! I'm already a fan!

rafaelsilva
Автор

Hi Dave, This is one of the best tutorials I ever have seen. thank you very much. I was wondering if you have any plans to cover test the model with test data and eventually how to put this in to production?

BhakthiLiyanage
Автор

amazon work. stunning lecture. so exciting!

TomerBenDavid
Автор

Hi Dave! If I do not have this two categories (ham and spam), I just have respondents row by row and text. What should I do?

vaz.felipe
Автор

Another great video.... waiting for the next one, how many videos are there in this series?

junaideffendi
Автор

I can't run rf.cv.1 function please anyone can help me?

farhanamim
Автор

Hi Dave, your videos are great. one quick questions. if we have more than 2 variables and there by multi dimensional confusion matrix then how are we going to deal with the Accuracy, Sensitivity and Specificity

WorldAroundWe
Автор

Thanks a lot Dave for these, immensely helpful.

One correction, the correct confusion matrix command should have been

confusionMatrix(rf.cv.1$finalModel$predicted, train.svd$Label)

instead of

confusionMatrix(train.svd$Label, rf.cv.1$finalModel$predicted)

as data is the first param and reference is the second. And hence, spam precision is good but spam recall is poor, rather than the other way round.

AmitYadav-zkzm
Автор

When using the Confusion Matrix, i believe we would have to be careful on the order of the actual and predicted parameters that go into the function.. What i mean is confusionMatrix (actual, predicted) would yield different results compared to confusionMatrix(actual, predicted) - is that correct ?

rajeshwaran
Автор

Hi Dave, you are talking about loading your cached results. How can I cache results myself? For my own project it seems caching results will save me a lot of time when I want to return to my own generated results.

mbeekink
Автор

hey I need help, I want to do confusionmatrix on rpart.cv.2 and I use the code confusionMatrix(train.tokens.tfidf.df$Label, then Error: `data` and` reference` appear should be factors with the same levels

any suggestions for my problem?

thank you

PCI
Автор

Hi Dave, Sensitivity is ratio of correct ham predictions over actual total ham values Not over total ham predictions. The column totals are the actual correct values and rows are the total predictions for the corresponding label class. Your formula is correct but what you said @8:04 was different. You defined Precision which is correct ham predictions over total ham predictions (row total).
Similarly, Specificity is ratio of correct spam predictions over actual total spam values ( second column total) Not total spam predictions (second row total).
I hope I've not confused you - I'm pointing out the difference between total predictions and total values...

shobhamourya
Автор

Hi Dave, I am running through the videos and applying this to the random acts of pizza dataset from kaggle. I am up to the point on running random forest on the train.svd and viewing the results. I have used the same stratified splits as in the videos. This data comes out at roughly 75/25. However, when I view the resulting confusion matrix it looks like this.

I expected a common split on the reference but this is way off. Am I doing something wrong or is my expectation incorrect, and this is more to do with the quality of the model so far? Appreciate any tips.

Reference
Prediction FALSE TRUE
FALSE 2123 10
TRUE 692 4

terrybrooks