Distance geometry for word embeddings and applications

preview_player
Показать описание
Sammy Khalife, LIX, Ecole polytechnique

Workshop on Distance Geometry, Semidefinite Programming and Applications

Date and Time: Friday, May 14, 2021 - 11:30am to 12:00pm

Abstract: Many methods used for the treatment of sequential data often rely on the construction of intermediate representations of unitary entities (e.g. words in natural language, or k-mers in bioinformatics). Traditionally, these representations are constructed with optimization formulations arising either from co-occurrence based models, or encoders combined with attention mechanisms. In this work, we propose a new formulation to embed these entities based on the Distance Geometry Problem (DGP): find object positions based on a subset of their pairwise distances. Considering the empirical Pointwise Mutual Information (PMI) as an inner product approximation, we discuss two algorithms to obtain approximate solutions of the underlying Euclidean DGP on large instances. The main advantage of our approach is its significantly lower computational complexity, which allows us to train representations much faster with a negligible quality loss. Furthermore, our experiments demonstrate these properties are valuable when pre-trained models are not available, for example in the case of low resource natural language processing, or for protein sequence embeddings in bioinformatics.
Рекомендации по теме