Mixture-of-Depths

Показать описание

Like 👍. Comment 💬. Subscribe 🟥.

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

hu-po

Рекомендации по теме

Комментарии

10:09 I think you mentioned earlier that you didn't make this channel to explain the basics, since there are many videos out there that already do. I wouldn't force you to make any, but I think your presentation style, even without the visuals, would catapult your explanations of the basics to something much better than most other videos (I've watched a lot of them). You have a knack for giving a well-rounded, complete explanation for everything, including the big picture. That shit has a LOT of value even in the sea of existing content. That said, it'd be for a different audience than your usual one, and if you don't feel this is something you want to do, then you 100% shouldn't.

Elikatie

I had a similar idea but instead of layers placed on top of each other I had in mind a bag of layers on the same level. Before this huge bag is a router that decides to which layers to route each token and at the end of a bag is another router that decides if the result is the final output or we need another iteration.
This architecture would automatically decide how many layers it needs for each level of abstraction (the architecture in the paper seems like also can do that though). My motivation was that sometimes we ask LLMs questions that dont need much thinking like "Hello. Who are you?". To answer these simple questions one layer would be enough, and for more complicated questions we could add layers to the bag and train them and the routers.

KennethFeur

I think FLOPs rrefers to total number of floating-point operations of Network while FLOPS refers to the computational throughput of a GPU.

ShaohuaDong

I wonder if propagating data back using similar mechanism could result in better models that can loop and reason deeply about stuff using loops between internal layers. Nowadays models can do crude version of loops like iterating trough math problem step by step, but that way they are constrained to "quantized" representations of the world in the form of tokens. Maybe letting models iterate in the hidden space could improve their quality, it also introduces a whole new set of problems like "how to deal with infinite loops", but I'm sure it's worth a try.

BHBalast

Mixture-of-Depths

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Mixture-of-Depths

Mixture-of-Depths: LLM's Efficiency Hack?

DoRA and Mixture-of-Depths

Mixture-of-Depths - Make AI Models Faster By 50%

[QA] Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

[short] Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Unlocking Efficiency in Transformers: The Mixture of Depths Approach

Mixture of Depths Dynamically allocating compute in transformer based language models Google 2024

Sparsity in LLMs - Sparse Mixture of Experts (MoE), Mixture of Depths

Unraveling the Mixture-of-Depths: A Leap in Transformer Efficiency

Mixture-of-Depths: LLM's Efficiency Hack? Ep.186

AI Research Radar | Mixture-of-Depths | Can Contrastive Learning Refine Embeddings? | BLINK

How To Make Your Lungs Explode When Scuba Diving

going deeper into the depths in mixture vr

Isolated column footing | Shallow Foundations in construction #shorts

A powerful home mixture to remove melasma from the depths of the skin

I Tested the World’s CHEAPEST Liquid Fertilizer

Wiggenweld Potion : Healing Potion : Color Changing Potion : DIY Prop Bottle : Harry Potter

AI Detection Guide - From the Depths

Concrete thickness explained! - The Barndominium show E136

How deep can you dive before being crushed?

Dyepot Weekly #400 - Depths of Shade: What You Can Learn from Comparisons