Cardinality part 1: the hat problem

preview_player
Показать описание
We discuss the problem of finding the cardinality of a dataset, as well as simple but memory-intensive strategies for calculating it. We discuss the "hat problem" formulation and how we can estimate the cardinality using the minimum value. Finally, we show how hash functions turn real-world cardinality problems into the "hat problem."

These materials are also openly available on figshare. Please cite this work; this ensures that funding agencies see the impact and importance of these open learning materials.

Channel: @BenLangmead
Рекомендации по теме
Комментарии
Автор

I don't understand how hashing solves the problem. If all of the elements in the data set are distinct, I can see how hashing will distribute them uniformly, but what if there are repeated elements?

benjaminmalley