filmov
tv
Data Security at Scale through Spark and Parquet Encryption

Показать описание
Big data presents new challenges for protection of privacy and integrity of sensitive information. Straightforward application of traditional file encryption and MAC techniques can’t cope with staggering volumes of data, flowing in modern analytic pipelines.
Apple addresses these challenges by leveraging the new capabilities in the Apache Parquet format. We work with the Apache Parquet community on a modular data security mechanism, that provides privacy and integrity guarantees for sensitive information at scale; the encryption specification has been approved and released by the Apache Parquet Format project. Today, there are two open source implementations of this specification – in Apache Arrow (C++) and in Apache Parquet-MR (Java) repositories. The latter has just been released in the parquet-mr-1.12 version – which means the Apache Spark and other Java/Scala based analytic frameworks can start working with Apache Parquet encryption.
In this talk, Gidon Gershinsky and Tim Perelmutov will outline the challenges of protecting the privacy of data at scale and describe the Apache Parquet encryption technology security approach. We will give a quick intro to usage of Apache Parquet encryption API in pure Java and in Apache Spark applications. We will also discuss the roadmap of the community work on new encryption features and on deeper integration with Apache Spark and other analytic frameworks. Finally, we will show a demo of the Apache Parquet modular encryption in action, sharing our learnings using it at scale.
Connect with us:
Apple addresses these challenges by leveraging the new capabilities in the Apache Parquet format. We work with the Apache Parquet community on a modular data security mechanism, that provides privacy and integrity guarantees for sensitive information at scale; the encryption specification has been approved and released by the Apache Parquet Format project. Today, there are two open source implementations of this specification – in Apache Arrow (C++) and in Apache Parquet-MR (Java) repositories. The latter has just been released in the parquet-mr-1.12 version – which means the Apache Spark and other Java/Scala based analytic frameworks can start working with Apache Parquet encryption.
In this talk, Gidon Gershinsky and Tim Perelmutov will outline the challenges of protecting the privacy of data at scale and describe the Apache Parquet encryption technology security approach. We will give a quick intro to usage of Apache Parquet encryption API in pure Java and in Apache Spark applications. We will also discuss the roadmap of the community work on new encryption features and on deeper integration with Apache Spark and other analytic frameworks. Finally, we will show a demo of the Apache Parquet modular encryption in action, sharing our learnings using it at scale.
Connect with us:
Комментарии