filmov
tv
#36 - 2021.02 - Speech Processing with JAX/Trax and how to fight Misinformation
![preview_player](https://i.ytimg.com/vi/QzYq9l-VYuQ/maxresdefault.jpg)
Показать описание
SLOT #1: Speech Processing with Deep Learning and JAX/Trax - M. Yusuf Sarıgöz, co-founder and AI researcher @ AI Labs
With advances in natural language processing, conversational
AI technologies such as personal assistants are getting more and more
popularity in our everyday life. This requires efficient speech models
that can understand and generate human-like speech. In this talk, we
will review technologies such as TensorFlow, Jax, Trax and others that
can help boost our research in speech processing. Trax is a new
end-to-end deep learning library by Google Brain focusing on clear
code and speed, while Jax is "Numpy on steroids." Trax relies on Jax
for hardware acceleration and provides a greater productivity for NLP
and speech researchers. Finally, we will have a look at the open and
free pre-trained German Tacotron2 and Multi-band MelGAN models, their
limitations and required future work, and how they were trained.
SLOT #2: - Catching Out-of-Context Misinformation using Self-Supervised Learning - Shivangi Aneja
Despite the recent attention to DeepFakes and other forms of image manipulations, one of the most prevalent ways to mislead audiences is the use of unaltered images in a new but false context, commonly referred to as out-of-context image use. Gathering a large-scale supervised dataset is challenging for this particular task due to limited data availability. To address these challenges and support fact-checkers, we propose a new technique (and dataset) that automatically detects conflicting image-text pairs. Our core idea is a self-supervised training strategy where we only need images with matching (and non-matching) captions from different sources to identify such conflicting image-text pairs, which then can be used to identify out-of-context image use.
With advances in natural language processing, conversational
AI technologies such as personal assistants are getting more and more
popularity in our everyday life. This requires efficient speech models
that can understand and generate human-like speech. In this talk, we
will review technologies such as TensorFlow, Jax, Trax and others that
can help boost our research in speech processing. Trax is a new
end-to-end deep learning library by Google Brain focusing on clear
code and speed, while Jax is "Numpy on steroids." Trax relies on Jax
for hardware acceleration and provides a greater productivity for NLP
and speech researchers. Finally, we will have a look at the open and
free pre-trained German Tacotron2 and Multi-band MelGAN models, their
limitations and required future work, and how they were trained.
SLOT #2: - Catching Out-of-Context Misinformation using Self-Supervised Learning - Shivangi Aneja
Despite the recent attention to DeepFakes and other forms of image manipulations, one of the most prevalent ways to mislead audiences is the use of unaltered images in a new but false context, commonly referred to as out-of-context image use. Gathering a large-scale supervised dataset is challenging for this particular task due to limited data availability. To address these challenges and support fact-checkers, we propose a new technique (and dataset) that automatically detects conflicting image-text pairs. Our core idea is a self-supervised training strategy where we only need images with matching (and non-matching) captions from different sources to identify such conflicting image-text pairs, which then can be used to identify out-of-context image use.