Oleksandr Pryymak - Probabilistic Data Structures & Approximate Solutions

preview_player
Показать описание
IPython Notebook:

View slides here:

Will your decisions change if you'll know that the audience of your website isn't 5M users, but rather 5'042'394'953? Unlikely, so why should we always calculate the exact solution at any cost? An approximate solution for this and many similar problems would take only a fraction of memory and runtime in comparison to calculating the exact solution.

This tutorial is a practical survey of useful probabilistic data structures and algorithmic tricks for obtaining approximate solutions. When should we use them, and when we should not trade accuracy for scalability. In particular, we start with hashing and sampling; address the problems of comparing and filtering sets, counting the number of unique values and their occurrences; touch basic hashing tricks used in machine learning algorithms. Finally, we analyse some examples of their usage show the full power: how to organise an online analytics, or how to decode a DNA sequence by squeezing a large graph into a bloom filter. 00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.

Рекомендации по теме