Introduction to R: Descriptive Statistics

preview_player
Показать описание
Summarizing data with basic descriptive statistics is an important part of both data exploration and reporting. In this lesson we cover how to generate statistics that measure the center and spread of variables including the mean, median, mode, variance and standard deviation.

** Note: I made a mistake in this video. The discussion of Kurtosis in the video is not accurate. Kurtosis is actually a measure of how much data is in the tails of a distribution v.s. the center which often overlaps with the idea of "peakedness", but it is not the same thing! YouTube doesn't allow annotations anymore so I cannot post a correction within the video itself; see Peter Westfall's comment below for more info!
**

This is lesson 21 of a 30-part introduction to the R programming language for data analysis and predictive modeling. Link to the code notebook below:

Intro to R: Descriptive Statistics

This guide does not assume any prior exposure to R, programming or data science. It is intended for beginners with an interest in data science and those who might know other programming languages and would like to learn R.

I will create the videos for this guide such that you should be able to learn a lot just watching on YouTube, but to get the most out of the guide, it is recommended that you create a Kaggle account so that you can fork and edit each lesson so that you can follow along and run code yourself.

Follow DataDaft on social media for news and updates:

Join the DataDaft Discord to discuss all things data science:

Introduction to R Playlist:
Рекомендации по теме
Комментарии
Автор

It is not true that kurtosis tells you about the pointiness or flatness of a distribution. Yes, the uniform distribution is flat and has low kurtosis, but to generalize from that example is like saying "well, I know all bears are mammals, so it must be true that all mammals are bears." In particular, the beta(.5, 1) distribution is infinitely pointy but has negative excess kurtosis.

And yes, the Laplace distribution is pointy and has high kurtosis, but to generalize from that example to say that high kurtosis distributions are pointy is similarly wrong (the bear/mammal analogy again.) In particular, the .9999U(0, 1) + .0001Cauchy mixture appears perfectly flat over 99.99% of the observable data, but has infinite kurtosis.

The correct interpretation of kurtosis is that it is a measure of tail heaviness (our outlier potential). It has nothing to do with the shape of the peak.

Mathematical logic is given in the following references:

peterwestfall
Автор

The "skewness" and "kurtosis" functions are not found in my version of R. Did that change? Do I need a different library? I have R 2021.09.2 Build 382. sd, mad, var, etc. work fine.r

wrdrennan