filmov
tv
R Tutorial: Introduction to Data Visualization with ggplot2
![preview_player](https://i.ytimg.com/vi/K8Al2CEd_6k/maxresdefault.jpg)
Показать описание
---
Hi and welcome the first course in DataCamp's data visualization with ggplot2 series!
My name is Rick Scavetta and I'll be the instructor for this series.
I've been training scientists on how to better understand and visualize their data since 2012. I'm very excited to bring my experience to DataCamp.
So what is data viz?
Data visualization is an essential skill for data scientists. It combines statistics and design in meaningful and appropriate ways.
On the one hand, data vis is a form of graphical data analysis, emphasizing accurate representation and interpretation of data.
On the other hand, data vis relies on good design choices, not only to make our plots attractive, but to also aid both the understanding and communication of results.
On top of that, there is an element of creativity, since at it's heart, data vis is a form of visual communication.
It's important to understand the distinction between exploratory and explanatory visualizations.
Exploratory visualizations are easily-generated, data-heavy and intended for a small specialist audience, for example yourself and your colleagues - their primary purpose is graphical data analysis.
Explanatory visualizations are labor-intensive, data-specific and intended for a broader audience, e.g. in publications or presentations - they are part of the communications process.
As a data scientist, it's essential that you can quickly explore data, but you'll also be tasked with explaining your results to stake-holders.
Good design begins with thinking about the audience - and sometimes that just means ourselves.
This data set contains the average brain and body weights of 62 land mammals. To understand the relationship here, the most obvious first step is to make a scatter plot, like this one.
Two mammals, the African and Asian Elephants have both very large brain and body weights, leading to a positive skew on both axes.
Here, applying a linear model is a poor choice since a few extreme values have a large influence.
A log transformation of both variables allows for a better fit.
So, although we began with a rough exploratory plot, that informed us about our data and lead us to a meaningful result.
In the end, we'd probably want a cleaned-up explanatory plot.
Here's a classic example from Francis Anscombe, first published in 1973.
When we imagine a linear model, as presented on this anonymous plot, we imagine that we are describing data that looks something like this. But this same model could be describing a very different set of data such as a parabolic relationship which calls for a different model.
Or data in which an extreme value has a large effect which becomes clear when the outlier is removed. And sometimes
The model may be describing a relationship where in fact there is none at all because some extreme values may be incorrect.
If we relied solely on the numerical output without plotting our data, we'd have missed distinct and interesting underlying trends.
We can see that data viz is rooted in statistics and graphical data analysis, but it's also a creative process that involves some amount of trial and error.
Alright, enough examples, let's get our fingers moving with some exercises.
#DataCamp #RTutorial #DataVisualizationwithggplot2