filmov
tv
Practical Genomics with Apache Spark - Tom White

Показать описание
"Discussions about the role of technology in genomics invariably focus on the massive growth in DNA sequencing since the beginning of the century, growth faster than Moore's law and which has led to the $1000 genome. However, future growth is projected to be even more spectacular, and to be a reality we need more powerful tools for genome analysis. Apache Spark is providing the foundation for these new tools, including two that I will cover in this talk: GATK and Hail, both open source projects from the Broad Institute. GATK and Hail are complementary: GATK provides pipelines for transforming DNA sequence data into the raw material (variant call data) needed by Hail to run genetic analysis across thousands of individuals. GATK started out originally as a single process program, but has now been ported to run on Spark at scale. Hail was written from the outset to run on Spark. In this talk I will look at how these frameworks take advantage of Spark to scale, some of the challenges in getting existing data formats to work with Spark, and some of the plans for the future.
Session hashtag: #EUres9"
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us:
Session hashtag: #EUres9"
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us: