Wikimedia Research Showcase - July 2022

preview_player
Показать описание
The Monthly Wikimedia Research Showcase is a public showcase of recent research by the Wikimedia Foundation's Research Team and guest presenters from the academic community. The showcase is hosted at the Wikimedia Foundation every 3rd Wednesday of the month at 9:30 a.m. Pacific Time/18:30 p.m. CET

Theme
2022 Wikimedia Foundation Research of the Year Award Winnersǃ

Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
By Krishna Srinivasan (Google)
The milestone improvements brought about by deep representation learning and pre-training techniques have led to large performance gains across downstream NLP, IR and Vision tasks. Multimodal modeling techniques aim to leverage large high-quality visio-linguistic datasets for learning complementary information across image and text modalities. In this talk, I introduce the Wikipedia-based Image Text (WIT) Dataset to better facilitate multimodal, multilingual learning. WIT is composed of a curated set of 37.5 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages.
WIT’s unique advantages include: WIT is the largest multimodal dataset by the number of image-text examples by 3x (at the time of writing). WIT is massively multilingual (first of its kind) with coverage over 100+ languages. WIT represents a more diverse set of concepts and real world entities relative to what previous datasets cover.

Assessing the Quality of Sources in Wikidata Across Languages
By Gabriel Amaral (King's College London)
Рекомендации по теме