Stanford CS25: V4 I From Large Language Models to Large Multimodal Models

Показать описание

May 9, 2024
Speaker: Ming Ding, Zhipu AI

As large language models (LLMs) have made significant advancements over the past five years, there is growing anticipation for seamlessly integrating other modalities of perception (primarily visual) with the capabilities of large language models. This talk will start with the basics of large language models, discuss the academic community's attempts at multimodal models and structural updates over the past one year. We will focus on introducing CogVLM, a powerful open-source multimodal model with 17B parameters (equivalent to a 7B dense model), and CogAgent, a model designed for scenarios involving GUI and OCR. Finally, we will discuss the applications of multimodal models and viable research directions in academia.

About the speaker:
Ming Ding is a research scientist at Zhipu AI based in Beijing. He obtained his bachelor's and doctoral degrees at Tsinghua University, advised by Prof. Jie Tang. His research interests include multimodal, generative models, and pre-training technologies. He has led or participated in the research works about multimodal generative models such as CogView and CogVideo; multimodal understanding models CogVLM and CogAgent; and language models such as GLM and GLM-130B.

Рекомендации по теме

Комментарии

PLEASE UPLOAD cs231n, cs237a cs237b, cs224r

harshitmeena

I wish my lecturer university give newest and recent knowledge instead still focusing in OLD MACHINE LEARNING

kingki

The finding that what matters is your pretraining loss is huge. It means we are wasting so much $ on additional parameters for specific tasks when smaller models are needed only.

tusharkelkar

Por alguna extraña razón las mujeres le dan más valor a lo que escuchan que a lo que esta escrito.
Y valgan verdades, la sociedad es femenina por definición
Por alguna extraña razón la gente le da más valor a las promesas electorales de aquel que se autoproclama defensor de tus derechos y te pide el voto a cambio de mejorarte la vida.
Sin embargo, luego de unos años, las únicas vidas que han mejorado son las vidas de aquellos que se habían autoproclamado.
Si la política consiste en engañar a la gente y a cuanta más personas engañes mejor.
Pregunto, ¿tu te sientes parte del problema o de la solución?

- IMV P2rtido Polític0 en WordPr3ss
-- El primer partido político con funcionamiento interno verdaderamente democrático en la historia de España.
-- Sin machitos alfa, sin capataces a lomo de caballo blanco, sin tonto pollas, sin chulo playas, sin cuadros, sin mi3rdas

javierluna

Can you add a voiceover this is almost unintelligible, sorry.

Nnonymus

Stanford CS25: V4 I From Large Language Models to Large Multimodal Models

Stanford CS25: V4 I Overview of Transformers

Stanford CS25: V4 I Jason Wei & Hyung Won Chung of OpenAI

Stanford CS25: V4 I From Large Language Models to Large Multimodal Models

Stanford CS25: V4 I Hyung Won Chung of OpenAI

Stanford CS25: V4 I Transformers that Transform Well Enough to Support Near-Shallow Architectures

Stanford CS25: V4 I Aligning Open Language Models

Stanford CS25: V4 I Demystifying Mixtral of Experts

Stanford CS25: V3 I Retrieval Augmented Language Models

Stanford CS25: V4 I Behind the Scenes of LLM Pre-training: StarCoder Use Case

Stanford CS25: V3 I Beyond LLMs: Agents, Emergent Abilities, Intermediate-Guided Reasoning, BabyLM

[VIET] Stanford CS25: V4 I Overview of Transformers - Part 1 (Phiên bản lồng tiếng)

Stanford CS25: V1 I Transformers United: DL Models that have revolutionized NLP, CV, RL

The Possibilities of AI [Entire Talk] - Sam Altman (OpenAI)

Segredos da IA: Insights da OpenAI na Stanford CS25 #IA #STANFORD #XMACNA #shorts

Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

Stanford CS25: V3 I How I Learned to Stop Worrying and Love the Transformer

Stanford CS25: V1 I Transformers in Vision: Tackling problems in Computer Vision

Stanford CS25: V1 I Self Attention and Non-parametric transformers (NPTs)

Stanford CS25: V1 I DeepMind's Perceiver and Perceiver IO: new data family architecture

[VIET] Stanford CS25: V4| Lecture 1 – Part 2: RLHF, ChatGPT, Gemini, Chain of Thought Reasoning

Andrew Ng: Opportunities in AI - 2023

clear voice CS25 Transformers United 2023 Introduction to Transformers w Andrej Karpathy

Stanford Seminar - Democratizing Robot Learning