Transformer-XL: Attentive Language Models Beyond a Fixed Length Context

preview_player
Показать описание
#nlproc #transformers #attention #longcontext

Hello everyone! This is my first video, where I explain a research paper in the field of Natural Language Processing. I was inspired by @YannicKilcher and @aicoffeebreak to start explaining research papers.

Pre-requisites:

Reference:
Dai, Z., Yang, Z., Yang, Y., Carbonell, J. G., Le, Q., & Salakhutdinov, R. (2019, July). Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 2978-2988).

Check out my profile on other social media platforms -
Рекомендации по теме
Комментарии
Автор

Nice explanation. I am sure it will be helpful for a lot of folks out here!

avinabsaha
Автор

In standard transformer we have Q*transpose(K), could you explain in better way why authors of Transformer-XL did transpose(Xi) * transpose(Wq) + Xj + Wk ? (transpose Query instead of Key)

huskyhusky
Автор

Do you have any open research paper reading groups?

formerkid