Python Tutorial: Exploratory data analysis

Показать описание

---

Let's now jump into our first dataset. It contains data pertaining to iris flowers in which the features consist of four measurements: petal length, petal width, sepal length, and sepal width. The target variable encodes the species of flower and there are three possibilities: 'versicolor', 'virginica', and 'setosa'.

As this is one of the datasets included in scikit-learn, we'll import it from there with from sklearn import datasets. In the exercises, you'll get practice at importing files from your local file system for supervised learning. We'll also import pandas, numpy, and pyplot under their standard aliases. In addition, we'll set the plotting style to ggplot using plt dot style dot use. Firstly, because it looks great and secondly, in order to help all you R aficionados feel at home.

We then load the dataset with datasets dot load iris and assign the data to a variable iris. Checking out the type of iris, we see that it's a bunch, which is similar to a dictionary in that it contains key-value pairs. Printing the keys, we see that they are the feature names: DESCR, which provides a description of the dataset; the target names; the data, which contains the values features; and the target, which is the target data. As you see here, both the feature and target data are provided as NumPy arrays. The dot shape attribute of the array feature array tells us that there are 150 rows and four columns. Remember: samples are in rows, features are in columns. Thus we have 150 samples and the four features: petal length and width and sepal length and width. Moreover, note that the target variable is encoded as zero for "setosa", 1 for "versicolor" and 2 for "virginica". We can see this by printing iris dot target names, in which "setosa" corresponds to index 0, "versicolor" to index 1 and "virginica" to index 2.

In order to perform some initial exploratory data analysis, or EDA for short, we'll assign the feature and target data to X and y, respectively. We'll then build a DataFrame of the feature data using pd dot DataFrame and also passing column names. Viewing the head of the data frame shows us the first five rows.

Now, we'll do a bit of visual EDA. We use the pandas function scatter matrix to visualize our dataset. We pass it the our DataFrame, along with our target variable as argument to the parameter c, which stands for color, ensuring that our data points in our figure will be colored by their species. We also pass a list to fig size, which specifies the size of our figure, as well as a marker size and shape.

The result is a matrix of figures, which on the diagonal are histograms of the features corresponding to the row and column. The off-diagonal figures are scatter plots of the column feature versus row feature colored by the target variable. There is a great deal of information in this scatter matrix. See, here for example, that petal width and length are highly correlated, as you may expect, and that flowers are clustered according to species. Now it's your turn to dive into a few exercises and to do some EDA. Then we'll back to do some machine learning. Enjoy!

Рекомендации по теме

Python Tutorial: Exploratory data analysis

Exploratory Data Analysis with Pandas Python

Python Tutorial: Exploratory data analysis

Exploratory Data Analysis in Pandas | Python Pandas Tutorials

Python Tutorial: Exploratory Data Analysis

Exploratory Data Analysis (EDA) Using Python | Python Data Analysis | Python Training | Edureka

Python Tutorial: Exploratory data analysis

Python Tutorial | How to Conduct An Exploratory Data Analysis | Beginner Friendly

Python Tutorial: Introduction to Exploratory Data Analysis

Data Science AIML End to End Session 2

Learn Exploratory Data Analysis (EDA) from Scratch | EDA in 5 hours | Satyajit Pattnaik

Exploratory Data Analysis in Python | Exploratory Data Analysis Using Python | Python Data Analysis

Exploratory Data Analysis with Python | PANDAS

Do FASTER Python Exploratory Data Analysis with this!

Exploratory Data Analysis

Python Tutorial: Visual Exploratory Data Analysis

Exploratory Data Analysis in Python using pandas

A 5 Minute Guide on Exploratory Data Analysis in Python

Exploratory Data Analysis In Python | Exploratory Data Analysis Project In Python | Simplilearn

Data Analyst Portfolio Project (Exploratory Data Analysis With Python Pandas)

Python Project for Data Analysis- Exploratory Data Analysis | Data Analyst Project

Exploratory Data Analysis Python | Python Data Analysis | Intellipaat

Learn to Perform Exploratory Data Analysis using Python | Exploratory Data Analysis | Edureka

Turn Your Excel Worksheet Into An Exploratory Data Analysis Report In Just 3 Lines Of Python Code 🔥...

How to perform Exploratory Data Analysis Using Python | Python Training | Edureka | Python Rewind -1