Exploratory Data Analysis & Modeling with Python + R - (Part I EDA with Python)

preview_player
Показать описание
Part I of a two part tutorial illustrating how to use Python and R in the same Jupyter notebook within Google Colab. This first video walks through how to conduct exploratory data analysis with Python, while the next video show how to model with R.

OBJECTIVE: Infer which explanatory variables significantly affect the size of trees within Duke Forest. Tree health can be used as a proxy for the overall health of the forest.

DATA DICTIONARY:
ID - Unique tree identifier.
yr - Year of the diameter recording.
cm - Measurement of the diameter of a tree's base. Measurements are made at breast height marked by a nail that holds a tag indicating the identifying tree number. This is the response variable.
annualprec - Total precipitation within the year.
summerpdsi - Palmer Drought Severity Index for the summer. Uses readily available temperature and precipitation data to estimate relative dryness.
wintertemp - Average winter (Jan. - Mar.) temperature.

CONNECT:

------Video Chapters------

0:00 - Intro
0:50 - Background on Data Set
2:19 - Loading Excel File onto Google Colab
2:47 - Reading Excel File into a pandas DataFrame
5:10 - Calculating Number of Trees Measured per Year
8:16 - Creating a Percentage Change Column
11:46 - Graphing a Correlation Heat Map with Seaborn
16:37 - Graphing a Pair Plot with Seaborn
19:05 - Graphing a Histogram with Matlplotlib
24:59 - Labeling Trees by Size
27:30 - Visualizing clustering with Plotly
Рекомендации по теме