filmov
tv
ICCV19: Oral Session 3.1B - Vision, Language, & Text
Показать описание
1. VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang
2. A Graph-Based Framework to Bridge Movies and Synopses
Yu Xiong, Qingqiu Huang, Lingfeng Guo, Hang Zhou, Bolei Zhou, Dahua Lin
3. From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason
Ajeet Kumar Singh, Anand Mishra, Shashank Shekhar, Anirban Chakraborty
4. Counterfactual Critic Multi-Agent Training for Scene Graph Generation
Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, Shiliang Pu, Shih-Fu Chang
5. Robust Change Captioning
Dong Huk Park, Trevor Darrell, Anna Rohrbach
6. Attention on Attention for Image Captioning
Lun Huang, Wenmin Wang, Jie Chen, Xiao-Yong Wei
7. Dynamic Graph Attention for Referring Expression Comprehension
Sibei Yang, Guanbin Li, Yizhou Yu
8. Visual Semantic Reasoning for Image-Text Matching
Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu
9. Phrase Localization Without Paired Training Examples
Josiah Wang, Lucia Specia
10. Learning to Assemble Neural Module Tree Networks for Visual Grounding
Daqing Liu, Hanwang Zhang, Feng Wu, Zheng-Jun Zha
11. A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, Jiebo Luo
12. Zero-Shot Grounding of Objects From Natural Language Queries
Arka Sadhu, Kan Chen, Ram Nevatia
13. Towards Unconstrained End-to-End Text Spotting
Siyang Qin, Alessandro Bissacco, Michalis Raptis, Yasuhisa Fujii, Ying Xiao
14. What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee