filmov
tv
Stanford CS25: V4 I From Large Language Models to Large Multimodal Models

Показать описание
May 9, 2024
Speaker: Ming Ding, Zhipu AI
As large language models (LLMs) have made significant advancements over the past five years, there is growing anticipation for seamlessly integrating other modalities of perception (primarily visual) with the capabilities of large language models. This talk will start with the basics of large language models, discuss the academic community's attempts at multimodal models and structural updates over the past one year. We will focus on introducing CogVLM, a powerful open-source multimodal model with 17B parameters (equivalent to a 7B dense model), and CogAgent, a model designed for scenarios involving GUI and OCR. Finally, we will discuss the applications of multimodal models and viable research directions in academia.
About the speaker:
Ming Ding is a research scientist at Zhipu AI based in Beijing. He obtained his bachelor's and doctoral degrees at Tsinghua University, advised by Prof. Jie Tang. His research interests include multimodal, generative models, and pre-training technologies. He has led or participated in the research works about multimodal generative models such as CogView and CogVideo; multimodal understanding models CogVLM and CogAgent; and language models such as GLM and GLM-130B.
Speaker: Ming Ding, Zhipu AI
As large language models (LLMs) have made significant advancements over the past five years, there is growing anticipation for seamlessly integrating other modalities of perception (primarily visual) with the capabilities of large language models. This talk will start with the basics of large language models, discuss the academic community's attempts at multimodal models and structural updates over the past one year. We will focus on introducing CogVLM, a powerful open-source multimodal model with 17B parameters (equivalent to a 7B dense model), and CogAgent, a model designed for scenarios involving GUI and OCR. Finally, we will discuss the applications of multimodal models and viable research directions in academia.
About the speaker:
Ming Ding is a research scientist at Zhipu AI based in Beijing. He obtained his bachelor's and doctoral degrees at Tsinghua University, advised by Prof. Jie Tang. His research interests include multimodal, generative models, and pre-training technologies. He has led or participated in the research works about multimodal generative models such as CogView and CogVideo; multimodal understanding models CogVLM and CogAgent; and language models such as GLM and GLM-130B.
Комментарии