filmov
tv
[ICASSP 2022] Text2Video: Text-driven Talking-head Video Synthesis with Phoneme-Pose Dictionary
Показать описание
Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary
With the advance of deep learning technology, automatic video generation from audio or text has become an emerging and promising research topic. In this paper, we present a novel approach to synthesize video from the text. The method builds a phoneme-pose dictionary and trains a generative adversarial network (GAN) to generate video from interpolated phoneme poses. Compared to audio-driven video generation algorithms, our approach has a number of advantages: 1) It only needs a fraction of the training data used by an audio-driven approach; 2) It is more flexible and not subject to vulnerability due to speaker variation; 3) It significantly reduces the preprocessing, training and inference time. We perform extensive experiments to compare the proposed method with state-of-the-art talking face generation methods on a benchmark dataset and datasets of our own. The results demonstrate the effectiveness and superiority of our approach.
Keywords: Computer Vision, Deep Learning, Object Detection, Action
Recognition, Safety Monitor, Activity Analysis
With the advance of deep learning technology, automatic video generation from audio or text has become an emerging and promising research topic. In this paper, we present a novel approach to synthesize video from the text. The method builds a phoneme-pose dictionary and trains a generative adversarial network (GAN) to generate video from interpolated phoneme poses. Compared to audio-driven video generation algorithms, our approach has a number of advantages: 1) It only needs a fraction of the training data used by an audio-driven approach; 2) It is more flexible and not subject to vulnerability due to speaker variation; 3) It significantly reduces the preprocessing, training and inference time. We perform extensive experiments to compare the proposed method with state-of-the-art talking face generation methods on a benchmark dataset and datasets of our own. The results demonstrate the effectiveness and superiority of our approach.
Keywords: Computer Vision, Deep Learning, Object Detection, Action
Recognition, Safety Monitor, Activity Analysis
[ICASSP 2022] Text2Video: Text-driven Talking-head Video Synthesis with Phoneme-Pose Dictionary
[ICASSP 2022] Text2Video: Text-driven Talking-head Video Synthesis with Phoneme-Pose Dictionary Demo
ICASSP 2022 promotion video
txt2video
Text2Video Demo
Talking Head before and after 1
Text2Video Demo 2
A Talking Head Video - Produced by Clear Point Video
What is a talking head video?
[ICASSP 2022] FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR
Talking Face Generation with Multilingual TTS | CVPR 2022 Demo
Iterative Text-based Editing of Talking-heads Using Neural Retargeting
ICASSP 2022 A Data-Driven Cognitive Salience Model for Objective Perceptual Audio Quality Assessment
Company Culture and Values | Free Talking-Head AI Video Template
Text2Video Demo
TTS-Pruning (ICASSP 2022)
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion (IJCAI 2021)
RemixIT: Continual self-training with bootstrapped remixing for speech enhancement [ICASSP 2022]
ICASSP-2022 Podium presentation | Gajecki and Nogueira
Text-Driven Mouth Animation
Talking Face Generation with Multilingual TTS (CVPR 2022 Demo Track)
Improving Wav2Lip results with DeepFaceLab
[2021 Fall] Team 18: Super Resolution Multi-shot Neural Talking Head Synthesis
Talking head videos: What you need to know
Комментарии