Statistics using R programming - Descriptive Statistics #statistics

preview_player
Показать описание
Statistics using R programming. 1. Descriptive statistics with R

Once your data is properly loaded into RStudio and you’ve begun to explore it visually, the next step is typically to describe the distribution of each variable numerically. In this tutorial we’ll look at descriptive statistics, which focuses on the measures of central tendency, variability, and distribution shape for continuous variables.

R has a rich availability of methods designed to explore descriptive statistics, both from the base installation and through the use of user contributed packages. In the base installation, you can use the summary() function to obtain descriptive statistics.
summary(mtcars[myvars])
The summary() function provides the minimum, maximum, quartiles, and mean for numerical variables and frequencies for factors and logical vectors.
You can also use the apply() and sapply() functions to provide any descriptive statistics you choose. The apply() function is used with matrices, and the sapply() function is used with data frames. The format for the sapply() function is
sapply(x, FUN, options)
where x is the data frame and FUN is an arbitrary function

The psych package also has a function called describe() that provides the number of nonmissing observations, mean, standard deviation, median, trimmed mean, median absolute deviation, minimum, maximum, range, skew, kurtosis, and standard error of the mean.
describe(mtcars[myvars])
Descriptive statistics by group
When comparing groups of individuals or observations, the focus is usually on the descriptive statistics of each group rather than the total sample. Group statistics can be generated using base R’s by() function. The format is

Summarizing data with dplyr
The dplyr package provides us with tools to quickly and flexibly summarizing data. The summarize() and summarize_all() functions can be used to calculate any statistic, and the group_by() function can be used to specify the groups on which tocalculate those statistics.
results are returned as tibbles (data frames)

by(data, INDICES, FUN)
where data is a data frame or matrix, INDICES is a factor or list of factors that defines the groups, and FUN is an arbitrary function that operates on all the columns of a data frame

summary statistics are also available to be generated by multiple variables

#statistics
#rprogramming
#descriptive
#rstudio
#rdatacode
Рекомендации по теме