filmov
tv
JOBIM2021 - Keynote Jean-Philippe Vert
Показать описание
Deep learning for biological sequences
In recent years, deep learning has revolutionized natural language processing (NLP), and is increasingly used to analyze biological sequences including DNA, RNA and proteins. While many deep learning architectures and techniques successful in NLP can be directly applied to biological sequences, there are also specificities in biological sequences that should be taken into account to adapt NLP techniques to that context. In this talk I will discuss several such specificities, including the fact that 1) biological sequences have no natural separation as a sequence of words, 2) a double-stranded DNA sequence can be represented by two reverse-complement sequences, and 3) a natural way to compare homologous biological sequences is to align them. In each case, I will show how the biological constraints can lead to specific models, and illustrate empirically the benefits of incorporating such prior knowledge on several tasks such as metagenomics read binning, protein-DNA binding prediction, or protein annotation.
In recent years, deep learning has revolutionized natural language processing (NLP), and is increasingly used to analyze biological sequences including DNA, RNA and proteins. While many deep learning architectures and techniques successful in NLP can be directly applied to biological sequences, there are also specificities in biological sequences that should be taken into account to adapt NLP techniques to that context. In this talk I will discuss several such specificities, including the fact that 1) biological sequences have no natural separation as a sequence of words, 2) a double-stranded DNA sequence can be represented by two reverse-complement sequences, and 3) a natural way to compare homologous biological sequences is to align them. In each case, I will show how the biological constraints can lead to specific models, and illustrate empirically the benefits of incorporating such prior knowledge on several tasks such as metagenomics read binning, protein-DNA binding prediction, or protein annotation.