filmov
tv
Graph Clustering Algorithms: Theoretical Insights For Practice

Показать описание
IGAFIT Algorithmic Colloquium #9
March 11, 2021, Vincent Cohen-Addad, Google Zürich
A classic problem in machine learning and data analysis is to partition the vertices of a graph into very dense subgraphs with low expansion. In practice, the most popular approaches often rely on local search or greedy algorithms; not only for the ease of implementation, their scalability and their efficiency, but also because of the accuracy of these methods on many real world graphs. Even though we can’t prove that these algorithms have good performance guarantees in the worst-case, can we understand what make them so successful in practice?
We review two popular objective functions — modularity and correlation clustering — and shed some light on the inner-workings of a popular heuristic (the so-called Louvain algorithm) by providing a “beyond worst-case” analysis. We then describe some key ideas to design graph clustering algorithms in massively parallel environments so as to optimize the above objective in only a constant number of rounds.
March 11, 2021, Vincent Cohen-Addad, Google Zürich
A classic problem in machine learning and data analysis is to partition the vertices of a graph into very dense subgraphs with low expansion. In practice, the most popular approaches often rely on local search or greedy algorithms; not only for the ease of implementation, their scalability and their efficiency, but also because of the accuracy of these methods on many real world graphs. Even though we can’t prove that these algorithms have good performance guarantees in the worst-case, can we understand what make them so successful in practice?
We review two popular objective functions — modularity and correlation clustering — and shed some light on the inner-workings of a popular heuristic (the so-called Louvain algorithm) by providing a “beyond worst-case” analysis. We then describe some key ideas to design graph clustering algorithms in massively parallel environments so as to optimize the above objective in only a constant number of rounds.