Conversation with Alexis Stamatakis

Показать описание

Alexis and I talk about how is love of aviation got him into computing, walk through the RAxML source code, discuss the tension between generality and optimization when developing software, and explore the interface between software and engineering.

We focus on RAxML, but the team is working on new inference tools:

This is a long conversation. Here are time points in case you want to jump to a specific topic.
0:38 How aviation led to an interest in computer science
2:06 How spatial proximity led to an interest in phylogenetics
4:58 Walk-through of RAxML source code starts
6:24 We get into parallel world - why the location of memory matters
8:37 Timing of floating point operations can depend on their value
12:25 Code for reading sequence alignments is more important and interesting than you thought
18:30 Error checking routines and verbose feedback reduce traffic on support forums
19:50 Awful increase in complexity for rarely used model - RNA secondary structure
20:35 Parallelization used to be harder to implement well
22:16 Allocating memory for conditional likelihood vectors (about 60% of memory footprint)
22:42 Main switch/case over all modes and options
24:40 Names of colleagues pop up in code for many collaborations
25:38 BIG_RAPID_MODE, a common search case
25:24 We jump into doInference(), which does phylogenetic inference
29:48 Getting the starting tree
33:12 computeBIGRAPID(), where the actual maximum likelihood search happens
34:31 Stopping (search convergence) criteria
36:01 The famous Thorough variable - how much to optimize branch lengths?
37:44 Tree proposal - Determining how local to be with subtree pruning and grafting
38:55 The general importance of reducing the number of preset analysis parameters
42:40 This one weird thing about phylogenetics...
44:00 Good to see people don't remember their own code
44:17 Main loop of tree search routine
48:47 Parallel strategy
54:51 General maximum likelihood profiling stats - 5% calculating likelihood at root, 20-20% branch length optimization, all the rest computing conditional likelihoods
55:43 Exact vs approximate methods
56:55 Numerical optimization is a difficult topic, can be 80% of development time
57:35 Calculating the likelihood
58:33 Tree representation
1:01:41 Optimizing with different functions depending on the descendants of each node
1:03:07 Missing data
1:05:04 Loop over sites
1:09:32 Tradeoffs between code complexity, generality, and optimization.
1:10:48 RAxML NG and libpll - refactors that build on lessons learned from original code base
1:11:44 Code modularity
1:14:23 The hypervolume of tools
1:15:20 The importance of software engineering
1:16:31 softWipe - rating bioinformatics tools by code quality
1:21:26 The engineering-science interface
1:26:26 Where will the next gains in phylogenetic speed come from?
1:33:55 sars-cov-2 phylogenetics
1:37:14 Machine learning in phylogenetics
1:42:11 Wrap up

Рекомендации по теме

Conversation with Alexis Stamatakis

Conversation with Alexis Stamatakis

RAxML-NG: a fast, scalable and user... - Alexey Kozlov and Alexandros Stamatakis - ISCBacademy

Emmanuel Stamatakis

Keeping Up with Senior Solutions: Episode 14 - Carol Stamatakis

Football is medicine 2020 - Emmanuel Stamatakis

Conversation with Joe Felsenstein

High Performance Computing and Phylogenetics Part 1

lighthouselamp - embrocation

Alexis F. Vasseur: Boundary vorticity estimate for the Navier-Stokes equation and control of the ...

Tadahisa Funaki: Hydrodynamic limit and stochastic PDEs related to interface motion

Patricia Gonçalves: On hydrodynamic limits of fractional PDEs from stochastic interacting...

PIE | Elections Webinar

TBI & blast exposure: an insight into Harvard Medical School and the Veterans Association in Bos...

Igor Kukavica: On the inviscid limit for the Navier-Stokes equations

iPlant Workshop @ 2013 CSHL Plant Genomes and Biotechnology Meeting - Introduction

How to Install RAxML using Anaconda

Breakdown of the hydrodynamic limit for extreme current fluctuations by Yongjoo Baek

HITS colloquium: Antonis Rokas on incongruence in the Tree of Life

Common Data Elements (CDE): Neuroimaging

Peter Constantin - On the inviscid limit

Interaction Measures, Partition Lattices and Kernel Tests for High-Order Interactions -

Quasi-static hydrodynamic limits by Stefano Olla

PSIKODJALI - FLORIAN MARKU (OFFICIAL VIDEO)

Sports science mini-note from the AIMOS 2023 conference