Using base R and testthat to calculate probabilities (CC271)

preview_player
Показать описание
Watch and code along with Pat as he uses test driven development using testthat and base R to count kmers and calculate probabilities for a naive Bayesian sequence classifier. Pat generates all possible kmers for a sequence and then all sequences in a collection. These kmer counts are then used to generate the word specific priors and genus-specific conditional probabilities that are needed to train the naive Bayesian classifier for 16S rRNA gene sequences. Along the way, Pat continues to use Test Driven Development using the testthat R package and a number of tools from base R. This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.

Check out the GitHub repository at the:

#rdp #16S #classification #classifier #microbialecology #microbiome

Support Riffomonas by becoming a Patreon member!

You can also find complete tutorials for learning R with the tidyverse using...

0:00 Introduction
6:00 Generate all kmers for a sequence
14:01 Generate kmers across all sequences
21:27 Calculate word-specific priors
27:42 Calculate genus-specific conditional probabilities
Рекомендации по теме
Комментарии
Автор

Awesome introduction to R package development from the researcher's/R user's perspective not from a software developer! I have also developed an R package for my team and I've made tons of similar mistakes that you showed here. I'm sure this test-driven approach is super insightful for those who're interested in building R packages. Looking forward to the next episode!

ThroughEyes
Автор

Just wanted to say Thank You for these videos. I have thought about making an R package for some time, but it's always intimidated me too much. Maybe I'll give it a try.

rubenbehnke
Автор

I must admit, I feel stupid going back in forth to understand what you are cooking there.

All your engineering skills you show off here! well done :D

svenr
Автор

You can multithread your code if you need more performance out of your fuctions, or when processing large data sets.
I have done this a few times with future + future.apply packages.

piezu.