Apache Druid Deep Dive

preview_player
Показать описание
Talk abstract: Apache Druid is an open source analytics database powering fresh, fast analytics in companies from AirBnB to Zeotap on clickstream, telemetry, financial transactions, applications and more. In this talk, we open the box on the three distributed processes in Druid led by the coordinator, overlord, and broker, and the ways that these come together to deliver reliable, performant query, ingestion, and management services.

Bio: Jon King is a Sr. Field Engineer at Imply. Jon has been in big data for 13+ years and is fluent in Hadoop, Spark, Hive, Presto and Druid. Previously, he’s built and managed data teams at Solifire, NetApp and Ibotta. He’s a 2x O’reilly Author (Operationalizing Data Lakes in the Cloud (2019)) and Contributor (Programming Hive (2012)). Outside of work, he enjoys traveling and spending time with his family in the Colorado mountains.
Рекомендации по теме
Комментарии
Автор

wondering if there is any detail use case study on fintech, particularly in wealth management firm

luckyfarru
Автор

37:15 if we group by time how can we count distinct users - per dimension
Ie how many unique visitors did I have in recent 2 weeks but filter by dimension"treatment-1"

56:00 how will it handle revenue sales dollars where the data does not seem to fit dictionary bitmaps? How fast to sum per dimensions?

programminginterviewsprepa