Scale By The Bay 2019: David Winters, GDPR Data Cleaner: Mutating Immutable Data

Показать описание

Remember when data engineers and data scientists used to say things like: * “Log everything” * “Never throwaway data” * “All data is important” * “What is useless data today is tomorrow’s data of gold” And then that four letter acronym came into our vernacular…. *G-D-P-R* Now, you hear statements like this… * “Do we really need this data?” * “Is this data used at all?” * “What does the GDPR say about this type of data?” Another change that came with the GDPR is the right for a user to request the deletion of their personal data. This is a tricky proposition for those dealing with big data, since all big data technologies were based on the concept of immutable data. Big data systems, such as Hadoop and Spark, scaled so well because there were no updates of data, instead only appends, and the data was written out in large blocks, not conducive to small updates/deletes. In this talk, we discuss how personal data can be cleansed from existing big data storage systems, such as columnar-oriented Hive tables and key-value stores, and we will introduce a new open source project that implements these ideas.

David Winters
GoPro
Big Data Architect
San Francisco Bay Area
TwitterTweet
David is an Architect in the Data Science and Engineering team at GoPro and the creator of their Spark-Kafka streaming data ingestion pipeline. He has been developing scalable data processing pipelines and eCommerce systems for over 20 years in Silicon Valley. David's current big data interests include streaming data as fast as possible from devices to near real-time dashboards and switching his primary programming language to Scala from Java after nearly 20 years. He holds a B.Sc. in Computer Science from The Ohio State University.

FunctionalTV

Рекомендации по теме

Scale By The Bay 2019: David Winters, GDPR Data Cleaner: Mutating Immutable Data

Scale By The Bay 2019: Justin Heyes-Jones, A Gentle Introduction to Comonads

Scale By The Bay 2019: Bill Venners, In Types We Trust

Scale By The Bay 2019: Tikhon Jelvis, What is Functional Reactive Programming?

Scale By The Bay 2019: Evan Chan, Rust and Scala, Sitting in a Tree….

Scale By The Bay 2019 Highlights

Scale By The Bay 2019: Thursday Keynote, Heather Miller, The Times Are A-Changin'

Scale By The Bay 2019: Jason Swartz, High Performance Serverless Functions in Scala

Scale By The Bay 2019: David Andrzejewski, Reliable Machine Learning

Scale By The Bay 2019: Thomas Gerber, From datasets to tables in a multitenant data lake

Scale By The Bay 2019: James Earl Douglas, Functional Electromagnetism

Scale By The Bay 2019: Alexander loffe, Quill + Doobie = Better Together

Scale By The Bay 2019: Jeremy Smith & Jonathan Indig, Solving the Scala Notebook Experience

Scale By The Bay 2019: Oli Makhasoeva, The Art of Asking Questions

Scale By The Bay 2019: Alexander Ioffe Interview

Scale By The Bay 2019: Kavita Laddad, Taming complex webapps with Scala and React

Scale By The Bay 2019: Ahir Reddy & Li Haoyi, Speedy Scala Builds at Databricks

Scale By The Bay 2019: Panel Discussion, Who Needs Serverless?

Scale By The Bay 2019: David Winters, GDPR Data Cleaner: Mutating Immutable Data

Scale By The Bay 2019: Paul Cleary, Re-programming the programmer, from Actors to FP

Scale By The Bay 2019: Bryan Cantrill, Was He Wright All Along? Software After Moore's Law

Scale By The Bay 2019: Petr Zapletal, Change Data Capture in Distributed Systems

Scale By The Bay 2019: Themba Fletcher, Runtime Types at Crunchbase

Scale By The Bay 2019: Heather Miller Interview

Scale By The Bay 2019: Colin Breck, Maximizing Throughput and Scalability for Akka Streams