Stanford CS25: V4 I Transformers that Transform Well Enough to Support Near-Shallow Architectures

Показать описание

May 2, 2024
Speaker: Jake Williams, Drexel University

Transformers that Transform Well Enough to Support Near-Shallow Architectures
The talk will discuss various effectiveness-enhancing and cost-cutting augmentations to language model (LM) learning process, including the derivation and application of non-random parameter initializations for specialized self-attention-based architectures. These are referred to as precision LMs (PLMs), in part, for their capability to effectively and efficiently train both large and small LMs. Highlighting their hallmark capability for training with only very limited resources, an introduction to PLMs will be followed by presentation of a developing application that localizes untrained PLMs on microprocessors to act as hardware-based controllers for small electronics devices. This will discuss their utility at training in air-gapped environments, training progressively bigger models on CPUs, as well as provide detail on a fully developed control system and its user interface, including recent experiments on Le Potato, where effective inference of user directives occurred after only 20 minutes of lay interaction over a microphone and light switch.

About the speaker:
Jake Ryland Williams is an Associate Professor of Information Science at Drexel University's College of Computing and Informatics in Philadelphia, Pennsylvania. Dr. Williams' has a background in physics and math with degrees from the University of Vermont, and his research leverages a quantitative linguistics perspective that applies math and statistical methodology to analyze and improve linguistic learning systems, alongside others that utilize shared neural methodology. Following a one-year Postdoctoral appointment at the University of California, Berkeley (Cal) studying large-scale machine learning in 2015, Dr. Willams became a data science (DS) faculty at Drexel, where he drove the foundation of a DS MS program and develops and instructs DS coursework, including on natural language processing with deep learning.

Рекомендации по теме

Комментарии

Dr. Williams is a legend; his teaching style and attention to students noticeably sets him apart from other faculty at Drexel. Great job Professor!

flacko

Stanford Online, This is so fun! I'm happy I found your channel!

IOSALive

Watched this 3 times, really helps w/ understanding if you walk thru the math as well. Going to try to build this myself! 🎉

chrisavila

Hello everyone. Is there a talk on the use of transformers in automatic Speech Recognition? I need one that dives into ASR in detail.

hamidmojarrad

Stanford CS25: V4 I Transformers that Transform Well Enough to Support Near-Shallow Architectures

Stanford CS25: V4 I Overview of Transformers

Stanford CS25: V4 I Transformers that Transform Well Enough to Support Near-Shallow Architectures

Stanford CS25: V4 I Hyung Won Chung of OpenAI

Stanford CS25: V4 I Jason Wei & Hyung Won Chung of OpenAI

Stanford CS25: V4 I Aligning Open Language Models

Stanford CS25: V4 I From Large Language Models to Large Multimodal Models

Stanford CS25: V4 I Demystifying Mixtral of Experts

Stanford CS25: V4 I Behind the Scenes of LLM Pre-training: StarCoder Use Case

Stanford CS25: V1 I Transformers United: DL Models that have revolutionized NLP, CV, RL

Stanford CS25: V3 I Retrieval Augmented Language Models

Stanford CS25: V3 I How I Learned to Stop Worrying and Love the Transformer

Stanford CS25: V1 I Self Attention and Non-parametric transformers (NPTs)

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

[VIET] Stanford CS25: V4 I Overview of Transformers - Part 1 (Phiên bản lồng tiếng)

Stanford CS25: V1 I Transformers in Vision: Tackling problems in Computer Vision

clear voice CS25 Transformers United 2023 Introduction to Transformers w Andrej Karpathy

Stanford CS25: V1 I DeepMind's Perceiver and Perceiver IO: new data family architecture

Stanford CS25: V3 I Beyond LLMs: Agents, Emergent Abilities, Intermediate-Guided Reasoning, BabyLM

Which jobs will AI replace first? #openai #samaltman #ai

Andrej Karpathy explains the success of transformers @ Lex Fridman podcast #shorts #deeplearning

308 - An introduction to language models with focus on GPT

Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors)

How I'd learn ML in 2025 (if I could start over)

25. Transformers