filmov
tv
Python Tutorial: Voting

Показать описание
---
In this lesson, you are going to learn how to use an ensemble technique known as voting.
In TV shows like "Who wants to be a Millionaire?", when a contestant doesn't know the right answer, they have the option of asking the audience. The contestant usually ends up choosing the answer most voted for by the audience, hoping it is the correct one.
In fact, according to stats from TV studios, the audience predicts the correct answer more than ninety percent of the time! This is thanks to the concept known as "wisdom of the crowds".
It refers to the collective intelligence based on a group of individuals instead of a single expert. The aggregated opinion of the crowd can be as good as (and is usually superior to) the answer of any individual, even that of an expert. This is a useful technique commonly applied to problem-solving, decision making, innovation, and prediction. We are particularly interested in prediction.
As the name implies, this majority voting technique combines the output of many classifiers using a majority voting approach. In other words, the combined prediction is the mode of the individual predictions.
It is recommended to use an odd number of classifiers. For example, if we use four classifiers, the predictions for positive and negative classes could be tied. Therefore, we need at least three classifiers, and when problem constraints allow it, use five or more.
There are some characteristics you need in your "crowd" for a voting ensemble to be effective. First, the ensemble needs to be diverse: you can do this by using different algorithms or different datasets. Second, each prediction needs to be independent and uncorrelated from the rest. Third, each model should be able to make its own prediction without relying on the other predictions. Finally, the ensemble model should aggregate individual predictions into a collective one. Keep in mind that Majority Voting is a technique which can only be applied to classification problems.
Building a voting classifier, receiving a list of classifiers, and returning the combined model would be a cumbersome script for us to build.
Luckily, scikit-learn already provides this functionality with the VotingClassifier class.
The main input - with keyword "estimators" - is a list of (string, estimator) tuples. Each string is a label and each estimator is a sklearn classifier.
You do not have to fit the classifiers individually, as the voting classifier will take care of that for us.
In this example, we instantiate a 5-nearest neighbors classifier (called clf_knn), a decision tree (called clf_dt), and a logistic regression (called clf_lr).
After that, we create a VotingClassifier passing the estimators and their labels in as a list. Here we use the labels "knn" for the 5-nearest neighbors classifier, "dt" for the decision tree, and "lr" for the logistic regression. We use 5 neighbors to avoid multi-modal predictions.
The combined model can be fitted to the training data, and then be used to make predictions.
Remember that fit is called with X_train and y_train, and predict only with X_test.
Then, we can evaluate the performance on the test set, passing y_test and y_pred to the accuracy_score function.
Let's now build our first ensemble models using voting!
Комментарии