[ICML 2024] A New Partial p-Wasserstein-Based Metric For Comparing Distributions

preview_player
Показать описание
The p-Wasserstein distance, for p ≥ 2, is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the p-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical 2-Wasserstein distance on n samples in 2 dimensions to converge to the true distance at a rate of n^(−1/2p), which is significantly slower than the rate of n^(−1/2) for 1-Wasserstein distance. We introduce a new family of distances parameterized by k ≥ 0, called (p, k)-RPW that is based on computing the partial 2-Wasserstein distance. We show that (1) (p, k)-RPW satisfies the metric properties, (2) (p, k)-RPW is robust to small outlier mass while retaining the sensitivity of p-Wasserstein distance to minor geometric differences, and (3) when k is a constant, (p, k)-RPW distance between empirical distributions on n samples in 2 dimensions converges to the true distance at a rate of n^(−p/4p-2), which is faster than the convergence rate of n^(−1/2p) for the p-Wasserstein distance. By setting parameters k or p appropriately, we can reduce our distance to the total variation, p-Wasserstein, and the Le ́vy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the 1-Wasserstein, 2-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.
Рекомендации по теме