Python Tutorial: Plotting with pandas

preview_player
Показать описание

---
Data visualization is a primary tool in a working data scientist's toolbox; let's see how to do it with pandas.

For convenience, we import pandas as pd and matplotlib dot pyplot as plt.

We load the AAPL stock data into a DataFrame using read_csv.

Notice the options parse_date=True and index_col='date' to force a datetime64 index.

Again, we'll use these alot with time series shortly.

Also observe entries in the volume column significantly in magnitude than other columns.

Now, we assign close_arr by indexing aapl 'close' (yielding a Series) and applying the values method (yielding a NumPy array).

Remember, the command plot can plot NumPy arrays or lists and the command show must be executed to make the plot visible.

This is the resulting plot of stock close prices.

Notice the horizontal axis of the plot corresponds to date indices of the array.

We can actually plot pandas Series directly.

We assign close_series from aapl as a Series and call plot with close_series as an argument.

The result is a similar plot but a bit nicer.

The plot function automatically uses the Series's datetime index labels along the horizontal axis.

An even nicer alternative is to use the pandas Series plot method; that is, apply close_series dot plot.

The result is as before but with even more formatting on the axis labels and the name of the axis (date) inferred from the Index name.

In fact, pandas DataFrames have a plot method just like pandas Series.

Calling aapl dot plot plots all of the columns of DataFrame aapl on the same axes.

Pandas plots each numerical column against the index and uses the column labels in the legend.

However, on this scale, we can't see all five line plots because one is so much larger than all the others.

We can produce a similar plot using plt dot plot from matplotlib (using the DataFrame as an argument).

This implicitly draws all the numeric columns of aapl against the Index.

The figure resembles the one plotted using the DataFrame method plot but there is no legend and no title on the date axis

Again, the volume column dominates the other five curves and they cannot be seen on this scale.

To remedy that problem, draw the plot again and call yscale('log').

This matplotlib function sets a logarithmic scale on the vertical axis.

The legend still appears automatically, but now we can distinguish volumes on the order of 10^7 from other price values on the order of 10^2.

Any matplotlib options can be used to customize a Series or DataFrame figure.

For instance, we can extract the open and close Series and plot them separately specifying the colors, line styles, and the legend labels.

We zoom the axis in to the year 2001 with vertical scale from 0 to 100 and we explicitly place a legend.

To find out more about matplotlib customization, see our course on Data Visualization in Python.

Notice, again, the horizontal date ticks are labelled for us cleanly.

Finally, having drawn a figure, it's useful to be able to save it for future use.

To obtain the preceding plot, we slice four columns and the rows corresponding to 2001 through 2004 inclusive from the aapl DataFrame (we'll learn more about time series splicing later).

We generate a plot and apply savefig to preserve the plot.

savefig can infer the file format -- for instance, PNG, JPG, PDF, and others -- from the suffix of the filename.

Now it's your turn to make some fancy plots using pandas in the exercises!

#Python #PythonTutorial #DataCamp #pandas #Plotting #Foundations #DataFrames
Рекомендации по теме
Комментарии
Автор

Thank you so much for this video been looking for how to get pandas to work in vs code, and the does the job.

cookrileyw