Python Tutorial : Applying logistic regression and SVM

Показать описание

---

In this video, we'll see how to run logistic regression and SVM with scikit-learn.

The LogisticRegression class in scikit-learn is used just like the other models you've seen in the prerequisite course. First, we import LogisticRegression from scikit learn. You'll notice we're importing from linear_model, because logistic regression is a linear classifier. More on this later.

Then, we instantiate an instance of the classifier.

We fit the classifier on our training set.

And then we can predict, compute the score, etc.

Let's try this on an example data set, in this case the wine classification data set built into scikit-learn.

We load the data set.

Then, we create and fit a LogisticRegression object.

We compute the training accuracy and see it's about 97%.

scikit-learn's LogisticRegression can also output confidence scores rather than "hard" or definite predictions.

Let's do this with the "predict_proba" function and test it out on the first training example.

Here the classifier is reporting over 99% confidence for the first class,
and very low probabilities for the other two. As a reminder, the little e means "10 to the power of", so you should interpret that first probability as 9-point-9 times 10 to the power of -1, or point-99, or 99%. We'll discuss these probabilities more in Chapter 3.

In scikit-learn, the basic SVM classifier is called LinearSVC for linear support vector classifier. The LinearSVC object works exactly the same way as LogisticRegression.

Note that this data set has more than 2 classes. scikit-learn's Logistic Regression and SVM implementations handle this automatically. We'll talk about how this works in Chapter 3.

We can repeat these steps again for the "SVC" class,
which fits a nonlinear SVM by default. As you can see, the classifier achieves 100% training accuracy. This could be the classifier overfitting, which is a risk we take when using more complex models like nonlinear SVMs. Later in this chapter, we'll discuss what it means for a classifier to be linear or not.

By the way, so far we've used the default hyperparameters for LogisticRegression, LinearSVC, and SVC. To remind you, a hyperparameter is a choice about the model you make before fitting to the data, and often controls the complexity of the model.

If the model is too simple, it may be unable to capture the patterns in the data, leading to low training accuracy; this is called underfitting. On the other hand, if the model is too complex it may learn the peculiarities of your particular training set, leading to lower test accuracy; this is called overfitting. This is a fundamental tradeoff in machine learning.

In Chapters 3 and 4 we'll delve into these classifiers in more detail so that, by the end of the course, you'll understand what many of the hyperparameters represent, how they affect this fundamental tradeoff, and how to go about setting them.

Now it's your turn to apply these classifiers.

#DataCamp #PythonTutorial #LinearClassifiersinPython