How To Make a Histogram in R

preview_player
Показать описание
Histograms are single variable plots that let you get a sense of the distribution of a numeric variable. Histograms are easy to make in both base R and ggplot2.

Code used in this clip:

# Historgram in base R
library(tidyverse)

data <- diamonds

hist(diamonds$price)

# Change the number of bins with the "breaks" argument:
hist(diamonds$price, breaks = 100)

# Histogram in ggplot2

data %>% ggplot(aes(x = price)) +
geom_histogram(bins=100, color = "black", fill="gray90")

Code Clips are basic code explanations in 3 minutes or less. They are intended to be short reference guides that provide quick breakdowns and copy/paste access to code needed to accomplish common data science tasks. Think Stack Overflow with a video explanation.

* Note: YouTube does not allow greater than or less than symbols in the text description, so the code above may not be exactly the same as the code shown in the video! For R that means I may use = for assignment and the special Unicode large < and > symbols in place of the standard sized ones for dplyr pipes and comparisons. These special symbols should work as expected for R code on Windows, but may need to be replaced with standard greater than and less than symbols for other operating systems.
Рекомендации по теме
Комментарии
Автор

Thank you so much. I kept trying a far more complicated method without success. I followed your clear instructions and voila .

wolveriness
Автор

Solid video and visuals -- even I understood this one. Also liked the bigger images in the thumbnail.

JapaneseQuest
Автор

You are great, man! excellent teaching, keep going

dennylsonmachado
Автор

Great video, Can you tell me what software you use to make this clip. I really enjoy learning this very clear to follow. Thanks!!

ProfNwin
Автор

im getting a blank box when a make a histogram...?

ImranKhan
Автор

Hello Thanks for your nice videos. I have the following R script which is for only one .tsv file. I want to tweak it in a way that can plot (Histogram + line) two similar but separate .tsv files with different colours overlaid on each other. Could you please guide?

# read in data
df = read.csv("your_distribution.tsv", sep="\t")

# filter Ks distribution (0.001 < Ks < 5)
lower_bound = 0.001
upper_bound = 5
df = df[df$Ks < upper_bound, ]
df = df[df$Ks > lower_bound, ]

# perform node-averaging (redo when applying other filters)
dff = aggregate(df$Ks, list(df$Family, df$Node), mean)

# reflect the data around the lower Ks bound to account for boundary effects
ks = c(dff$x, -dff$x + lower_bound)

# plot a histogram and KDE on top
hist(ks, prob=TRUE, xlim=c(0, upper_bound), n=50)
lines(density(ks), xlim=c(0, upper_bound))

ardykharabian