How to Rapidly Process & Query Large Data Sets

Показать описание

This EGG NYC Conference talk is titled "Architecting Massively Scalable Self-Service Analytics Platforms" by Shailesh Doshi, Senior Data Engineer at Pivotal Software.

Talk Abstract:

How do you create a data environment for self-service analytics, data science and machine learning that can handle petabytes of data and hundreds of concurrent users? A massively scalable data analytics platform such as Pivotal Greenplum makes cleansed, collated data at scale available to your Dataiku users. In this talk, we provide an overview and demo of how you can rapidly process and query large data sets in Dataiku taking advantage of Pivotal Greenplum and in-database analytics functions. We’ll show how to query across diverse datasets, how to prepare data, train machine learning algorithms, work with geospatial data, and conduct text analysis -- all executing within the Pivotal Greenplum data warehouse. We’ll also touch how to enforce data governance and access control.

Speaker Bio:

Shailesh Doshi is a data engineer with Pivotal Software who helps make customers successful through his background and experience in all things data and data science. What gives Shailesh job satisfaction is helping customers transform businesses into modern data driven organizations specifically around cloud and data strategy with data driven cloud native application transformation.

This talk was part of Dataiku's EGG NYC 2018 Conference.

Twitter: @dataiku
Instagram: @dataiku

Turn on our channel notifications for the latest data science and AI updates!