Using the vegan R package to generate ecological distances (CC188)

preview_player
Показать описание
The vegan R package has a powerful set of functions for calcuating the ecological distance between communities. In this episode, Pat shares how to get your data in the right format to use vegdist and avgdist prior to analyzing the distances using NMDS. He discusses using rarefaction with avgdist to control for uneven sampling effort since the Bray-Curtis dissimilarity index is sensitive to uneven sampling effort. We'll use the metaMDS function from vegan and tools from ggplot2 and the tidyverse packages.

#vegdist #avgdist #vegan #ggplot2 #R #Rstudio #Rstats

You can also find complete tutorials for learning R with the tidyverse using...

0:00 Calculating ecological distances with the vegan R package
2:08 Preparing matrix of sample by taxa counts
10:23 Calculating distances using vegdist
12:27 Using community matrix directly in metaMDS
13:18 Rarefying distance calculations using avgdist
Рекомендации по теме
Комментарии
Автор

Great channel! I've been trying for months to learn some of these techniques from scattered sources and you're really helping me make sense of the mess of lessons I've tried to wrap my head around.

overcup
Автор

Thank you so much for the great channel! 💙💙. your videos is super helpful... simply it is awesome😃

hebaahmed-tqqf
Автор

thanks for showing vegan <3 I love this channel
Learning every second day with you 100% guaranteed

I learned yesterday this trick :
df %>%
mutate(day = str_replace(Group, ".D*, "", .before =2)

which will put the mutated column to a designated position, in the example above on position 2, just in front of the "old" column 2
so you dont need these select(1, 2, everything() ) lines anymore

svenr
Автор

Very informative!! Thank you!! I usually attribute name to the object in the last dplyr function as " %>% as.data.frame(.) -> new_object" but I know it is little weird :)

igordemetriusalencar
Автор

I am so glad I found this video.. <3

unavaliableavaliable
Автор

For an alternative to the usual rarefaction method, take a look at the SRS function in the SRS package. 1. Beule L, Karlovsky P. Improved normalization of species count data in ecology by scaling with ranked subsampling (SRS): application to microbial communities. PeerJ. 2020;8:e9593.

johnquensen
Автор

This is a great overview of using vegan for calculating distances and plotting them. Some nice additions (if you don't already have planned) would be to show how to pull out which variables (or species) are driving the spread on the plot and adding that data to the plot. You mentioned that the different clouds pertained to different days, so I'm assuming you're going to discuss that in another video.

samprice
Автор

Just a note: we handle data frames of abundance data just fine in vegan's community ecology functions, including `vegdist()`. The only restriction is that you have to get rid of meta data (the `Group` column in Pat's data) from the data frame just like Pat showed in the video. You just don't need to do the last step of converting to a matrix.

ftboth
Автор

Hi Pat, thank you so much for your videos! They are always very complete and didactic.

I would like to ask a question, is it possible to calculate the Bray-Curtis similarity and then build a dendrogram using ggplot2? Could you make a video on how to build a BC similarity dendrogram?

viniciusestrella
Автор

Great explanation!! It will be awesome if you can reduce the talk speed a bit though...

samadhigunathunga
Автор

Hi Pat! This was super helpful. I've performed rarefaction on my data using rrarefy in vegan and looked at alpha diversity of particular samples, but I still want to calculate the distance between some samples. Should I run avgdist on my original data to calculate the distance between ALL samples, then run metaMDS on just the samples I'm interested in? Or should I run avgdist on just the samples I'm interested in? Also, is it improper that I would rarify using rrarefy to look at alpha diversity then rarify again to look at beta diversity? Should I be using the same rarified data for both analyses?! Sorry for all the questions! I'm new to microbiome analysis

bridget
Автор

Hi Pat, thank you for sharing! When analyzing for group differences in distances, do you always test for dispersion effects afterwards? will there be a video about this in the future?

Rydaholic
Автор

Very helpful demo.. just wanted to clarify something. Why did you take sample=1800 at 14:36??

vikashiremath
Автор

Hey Pat great video and thanks for all your work on this channel. I am having an issue once I arrive at the `scores( nmds )` line. I get an error that states the following: "Error in x$species[, choices, drop = FALSE] :
incorrect number of dimensions". Have you or anybody else encountered this?

chrismaino
Автор

Hi Pat! Thanks so much for the videos, I've just recently discovered your channel and it's been incredibly helpful for my learning process.

I'm wondering if you could clarify the need to calculate distance matrix before running NMDS? I have a species assemblage dataset from an underwater visual census (UVC). My data has a ton of zeroes and just like yours, a lot of columns (species). I've ran both NMDS without calculating the vegdist (+ automatic transformations) and with vegdist. They look similar but not the same. Thus I'm not sure which one to use for my publication. Why would you advice me against using the plot without prior calculation of distance matrix?

Also, seems like my data has a high stress (>0.2) when run with k=2. If I run it with k=3, should I be presenting the figure in 3D?

Thanks in advance!

Rinaldigotama
Автор

You really need to put `+ coord_equal()` or +`coord_fixed()` on your ordination diagrams created by hand. The Euclidean distance on the plot is some approximation to some other distance (in NMDS the rank order of the euclidean distance on the plot is intended to be a close approximation of the original distances between samples) and if you don't keep a fixed aspect ratio this visual distance interpretation is broken

ftboth
Автор

Dear professor Pat, I was just wondering if I can use a presence/absence data set for avgdist(). Wouldn't that be inappropriate as rarefaction is based on abundance data, not presence/absence?

wenyizhou
Автор

Hi Pat, thanks for the nice vedio! when use nmds <- metaMDS(shared, autotransform = FALSE), then score(nmds), the output has both $sites (which is the Group here) and $species (OTUs). I cannot directly pipe it to ggplot. I wonder how you deal with it? Thanks!

guani
Автор

How can I build a dendrogram with bray curtis dissimilarity in R?

dr.ozgekahramanilkkan