R Tutorial: Data Visualization Best Practices in R

Показать описание

---

Welcome to "Data Visualization best practices... in R". Hopefully by the end of this course data visualization will have transitioned from just another step in your data science pipeline to an integral part of your data exploration and question answering toolkit. Let's get started.

The main objective of this course is to help you pull upon experience and general concepts to think deeply about the data you're working with and make the best visualization for the problem at hand.

We will do this by looking at a variety of different data types: point data, proportions, and distributions, and discussing the pros and cons of the standard visualization types you are probably familiar with. In addition, we will cover alternative visualizations that solve issues that often arise with the standard methods.

Over the four chapters of this course, we will discuss proportions data and it's common visualizations plus ways to improve them.

We will then move on to point data in chapter two, single distributions data in chapter three, and finally, comparing multiple distributions to each other in chapter 4.

Before we get started a warning is necessary.

Many topics we will discuss will not be cut and dry like most of the programming mercifully is.

The rules and principles we will go over are usually the best path, but there are lots of instances where you have to break one of the 'rules' that we cover for whatever reason.

By the end of the course, you will have all the tools you need to weigh the pros and cons of different techniques or chart types and choose the best for your problem.

The overall lessons in this course can be applied in any tool you may use to make visualizations, be it Excel, Tableau, or Microsoft Paint.

However, this course itself will be taught using R and specifically using the tidyverse set of data-manipulation packages with ggplot2 for plots. For this reason, I highly recommend completing the DataCamp courses on these packages before starting here. That way you can focus on the concepts instead of the nuances of implementation.

For the first two chapters we will use a dataset obtained from the world health organization, or WHO, that details counts of seven different diseases: measles, mumps, diptheria, pertussis, polio, rubella, and yellow feaver, across different countries from the years 1980 to 2016.

It contains columns on The region the country resides, such as AMR for America. The three letter countryCode and full country name. The disease, The year of the observation and finally, the number of cases observed of the given disease.

These data are very rich and offer us a lot of opportunities to employ different visualization techniques. For instance to get a feel for the dataset we may want to plot a simple scatter of the number of cases observed for a given region over time.

To do this we will filter the data to the American region and then feed the filtered data to ggplot with the point geometry with x mapped to the year, y mapped to the case counts, and the color mapped to the disease.

From this simple plot we can see that there was an interesting spike of rubella in the early 2000s and measles used to be a bigger deal than it is currently. We will investigate these patterns in better visualizations and greater depth in the chapters to come.

Enough of me talking at you, let's get our feet wet by getting to know this dataset a little bit more with some exploratory plots.

#DataCamp #RTutorial #VisualizationBestPracticesinR #DataVisualization BestPracticesinR