filmov
tv
Dynamic DDL: Adding Structure to Streaming Data on the Fly - David Winters and Hao Zou
![preview_player](https://i.ytimg.com/vi/yoKadO1GQzQ/maxresdefault.jpg)
Показать описание
"At the end of day, the only thing that data scientists want is tabular data for their analysis. They do not want to spend hours or days preparing data. How does a data engineer handle the massive amount of data that is being streamed at them from IoT devices and apps, and at the same time add structure to it so that data scientists can focus on finding insights and not preparing data? By the way, you need to do this within minutes (sometimes seconds). Oh... and there are a lot of other data sources that you need to ingest, and the current providers of data are changing their structure.
GoPro has massive amounts of heterogeneous data being streamed from their consumer devices and applications, and they have developed the concept of ""dynamic DDL"" to structure their streamed data on the fly using Spark Streaming, Kafka, HBase, Hive and S3. The idea is simple: Add structure (schema) to the data as soon as possible; allow the providers of the data to dictate the structure; and automatically create event-based and state-based tables (DDL) for all data sources to allow data scientists to access the data via their lingua franca, SQL, within minutes.
Session hashtag: #SFdev6"
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us:
GoPro has massive amounts of heterogeneous data being streamed from their consumer devices and applications, and they have developed the concept of ""dynamic DDL"" to structure their streamed data on the fly using Spark Streaming, Kafka, HBase, Hive and S3. The idea is simple: Add structure (schema) to the data as soon as possible; allow the providers of the data to dictate the structure; and automatically create event-based and state-based tables (DDL) for all data sources to allow data scientists to access the data via their lingua franca, SQL, within minutes.
Session hashtag: #SFdev6"
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us: