MQSS 2019 | L13: Plenary Lecture II | Sean Hackett

preview_player
Показать описание
The peptide-spectrum matching problem where experimental fragmentation spectra are matched to the theoretical spectra of potential peptide matches is a crucial step in computational proteomics and is essential for lending interpretability to proteomics results. Considerable innovation has continued to advance the state of the art on this problem, yet we are still left with a substantial fraction of fragmentation spectra which cannot be matched to a peptide in a typical experiment. These unidentified "dark peptides" encompass around 50% of spectra from a yeast experiment but over 90% of spectra from plasma. Making headway on this problem will require us to both search an expanded set of possible modified peptides (e.g., phospho, non-tryptic) and to improve searching algorithms to deal with a larger search space without being overwhelmed by false-positive identifications. During my talk I will discuss the connection between these two problems, focusing more extensively on recent improvements we have made on applying deep learning to the peptide-spectrum matching problem. Machine learning methods are very well suited for prediction tasks where ground truth labels are present (e.g., this is a picture of a cat and this a picture of squirrel), however for bioinformatics problems where "ground truth" is inferred using conventional algorithms, considerable care must be taken to avoid overfitting. I will discuss some of the pit-falls we ran into in fitting very powerful algorithms to noisy labels, and how we were ultimately able to develop a model using conventional algorithms which significantly outperforms them.
Рекомендации по теме