SAFARI Live Seminar: Accelerating Genome Sequence Analysis via Efficient HW/Algorithm Co-Design

Показать описание

Accelerating Genome Sequence Analysis via Efficient Hardware/Algorithm Co-Design
Damla Senol Cali, Bionano Genomics

Papers:

SAFARI Live Seminar Series Talk #11

Abstract:
Genome sequence analysis plays a pivotal role in enabling many medical and scientific advancements in personalized medicine, outbreak tracing, the understanding of evolution, and forensics. Modern genome sequencing machines can rapidly generate massive amounts of genomics data at low cost. However, the analysis of genome sequencing data is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data. Moreover, as sequencing technologies advance, the growth in the rate that sequencing devices generate genomics data is far outpacing the corresponding growth in computational power, placing greater pressure on these bottlenecks.

In this seminar, we provide an overview of our four works, where we characterize the real-system behavior of the genome sequence analysis pipeline and its associated tools, expose the bottlenecks and tradeoffs of the pipeline and tools, and co-design fast and efficient algorithms along with scalable and energy-efficient customized hardware accelerators for the key pipeline bottlenecks to enable faster genome sequence analysis.

First, we comprehensively analyze the tools in the genome assembly pipeline for long reads in multiple dimensions (i.e., accuracy, performance, memory usage, and scalability), uncovering bottlenecks and tradeoffs that different combinations of tools and different underlying systems lead to. We show that we need high-performance, memory-efficient, low-power, and scalable designs for genome sequence analysis in order to exploit the advantages that genome sequencing provides. Second, we propose GenASM, an acceleration framework that builds upon bitvector-based approximate string matching (ASM) to accelerate multiple steps of the genome sequence analysis pipeline. We co-design our highly-parallel, scalable and memory-efficient algorithms with low-power and area-efficient hardware accelerators. We evaluate GenASM for three different use cases of ASM in genome sequence analysis and show that GenASM is significantly faster and more power- and area-efficient than state-of-the-art software and hardware tools for each of these use cases. Third, we implement an FPGA-based prototype for GenASM, where state-of-the-art 3D-stacked memory (HBM2) offers high memory bandwidth and FPGA resources offer high parallelism by instantiating multiple copies of the GenASM accelerators. Fourth, we propose SeGraM, the first hardware acceleration framework for sequence-to-graph mapping and alignment. Instead of representing the reference genome as a single linear DNA sequence, genome graphs provide a better representation of the diversity among populations by encoding variations across individuals in a graph data structure, avoiding a bias towards any one reference. SeGraM enables the efficient mapping of a sequenced genome to a graph-based reference, providing more comprehensive and accurate genome sequence analysis. For SeGraM, we co-design algorithms and accelerators for memory-efficient minimizer-based seeding and bitvector-based, highly-parallel sequence-to-graph alignment. Compared to state-of-the-art software tools for sequence-to-graph mapping and alignment, we show that SeGraM significantly increases the throughput and reduces the power consumption for both short and long reads.

Overall, we demonstrate that genome sequence analysis can be accelerated by co-designing scalable and energy-efficient customized accelerators along with efficient algorithms for the key steps of genome sequence analysis. We hope that this seminar inspires future work in co-designing algorithms and hardware together to create powerful frameworks that accelerate other genomics workloads and emerging applications.

Speaker Bio:
Damla Senol Cali is a “Staff Software Engineer, Hardware Acceleration” at Bionano Genomics. She received her Ph.D. degree in Computer Engineering from SAFARI Research Group at Carnegie Mellon University, where she was advised by Prof. Onur Mutlu and Prof. Saugata Ghose. Her research focuses on hardware/software co-design for accelerating bioinformatics applications and genomic data analysis. She is also excited about memory systems and processing-in-memory. During her Ph.D., she also interned at Intel Labs in 2018 and 2020. She obtained her M.S. in Computer Engineering from Carnegie Mellon University in 2019, and her B.S. in Computer Engineering from Bilkent University in 2015.

Onur Mutlu Lectures

Рекомендации по теме

SAFARI Live Seminar: Accelerating Genome Sequence Analysis via Efficient HW/Algorithm Co-Design

SAFARI Live Seminar: Accelerating Genome Sequence Analysis via Efficient HW/Algorithm Co-Design

SAFARI Live Seminar - GenPIP: In-Memory Acceleration of Genome Analysis

SAFARI Live Seminar - Software/Hardware Co-design and Dataflow acceleration for Short Read Alignment

SAFARI Live Seminar: DAMOV: A New Methodology & Benchmark Suite for Data Movement Bottlenecks

SAFARI Live Seminar - Thinking Outside the Die: Architecting the ML Accelerator of the Future

SAFARI Live Sem. - Accelerating Irregular Applications via Efficient Synch. & Data Access Techni...

SAFARI-EFCL Seminar - Landscape of Genomics for Systems Research

Seminar in Computer Architecture - Lecture 2: Accelerating Genome Analysis (Spring 2023)

SAFARI Live Seminar - Introduction to the UPMEM DPU Architecture

SAFARI Live Seminar - From C/C++ code to high‐performance dataflow circuits

SAFARI Live Seminar - MetaSys: Cross-Layer Optimization with a Practical Metadata Management System

SAFARI Live Seminar - Modern trends in accelerator design with high-level synthesis

SAFARI Live Seminar - An Ecosystem for Scalable & Computationally Efficient Nanopore Data Proces...

SAFARI Live Seminar - HBM3 RAS: The Journey to Enhancing Die-Stacked DRAM Resilience at Scale

Seminar in Computer Arch. - Lecture 5: Accelerating Genome Analysis (Spring 2022)

SAFARI Live Seminar: Understanding a Modern Processing-in-Memory Architecture

EFCL & SAFARI Live Seminar - Enabling Practical Processing Using Emerging Memories

Accelerating Genomics Course - Meeting 1: Course Introduction & Project Proposals (Spring 2022)

SAFARI Live Seminar: Efficient DNN Training at Scale: from Algorithms to Hardware - Gena Pekhimenko

P&S Accelerating Genomics - Lecture 1a: Course Introduction (Spring 2023)

P&S Mobile and Accelerating Genomics - Lecture 6c: SneakySnake (Spring 2023)

Mobile Genomics Course - Meeting 11: Accelerating Genome Sequence Analysis (Spring 2022)

Accelerating Genomics Course - Meeting 2: Introduction to Sequencing (Spring 2022)

Accelerating Genomics Course - Lecture 2: Course Introduction & Logistics (Fall 2022)