Conversation with Alexis Stamatakis

preview_player
Показать описание
Alexis and I talk about how is love of aviation got him into computing, walk through the RAxML source code, discuss the tension between generality and optimization when developing software, and explore the interface between software and engineering.

We focus on RAxML, but the team is working on new inference tools:

This is a long conversation. Here are time points in case you want to jump to a specific topic.
0:38 How aviation led to an interest in computer science
2:06 How spatial proximity led to an interest in phylogenetics
4:58 Walk-through of RAxML source code starts
6:24 We get into parallel world - why the location of memory matters
8:37 Timing of floating point operations can depend on their value
12:25 Code for reading sequence alignments is more important and interesting than you thought
18:30 Error checking routines and verbose feedback reduce traffic on support forums
19:50 Awful increase in complexity for rarely used model - RNA secondary structure
20:35 Parallelization used to be harder to implement well
22:16 Allocating memory for conditional likelihood vectors (about 60% of memory footprint)
22:42 Main switch/case over all modes and options
24:40 Names of colleagues pop up in code for many collaborations
25:38 BIG_RAPID_MODE, a common search case
25:24 We jump into doInference(), which does phylogenetic inference
29:48 Getting the starting tree
33:12 computeBIGRAPID(), where the actual maximum likelihood search happens
34:31 Stopping (search convergence) criteria
36:01 The famous Thorough variable - how much to optimize branch lengths?
37:44 Tree proposal - Determining how local to be with subtree pruning and grafting
38:55 The general importance of reducing the number of preset analysis parameters
42:40 This one weird thing about phylogenetics...
44:00 Good to see people don't remember their own code
44:17 Main loop of tree search routine
48:47 Parallel strategy
54:51 General maximum likelihood profiling stats - 5% calculating likelihood at root, 20-20% branch length optimization, all the rest computing conditional likelihoods
55:43 Exact vs approximate methods
56:55 Numerical optimization is a difficult topic, can be 80% of development time
57:35 Calculating the likelihood
58:33 Tree representation
1:01:41 Optimizing with different functions depending on the descendants of each node
1:03:07 Missing data
1:05:04 Loop over sites
1:09:32 Tradeoffs between code complexity, generality, and optimization.
1:10:48 RAxML NG and libpll - refactors that build on lessons learned from original code base
1:11:44 Code modularity
1:14:23 The hypervolume of tools
1:15:20 The importance of software engineering
1:16:31 softWipe - rating bioinformatics tools by code quality
1:21:26 The engineering-science interface
1:26:26 Where will the next gains in phylogenetic speed come from?
1:33:55 sars-cov-2 phylogenetics
1:37:14 Machine learning in phylogenetics
1:42:11 Wrap up
Рекомендации по теме
welcome to shbcf.ru