Все публикации

xLAM: A Family

xLAM: A Family of Large Action Models to Empower AI Agent Systems

In Defense of

In Defense of RAG in the Era of Long-Context Language Models

[QA] Building Math

[QA] Building Math Agents with Multi-Turn Iterative Preference Learning

Building Math Agents

Building Math Agents with Multi-Turn Iterative Preference Learning

Attention Heads of

Attention Heads of Large Language Models: A Survey

[QA] Attention Heads

[QA] Attention Heads of Large Language Models: A Survey

[QA] The AdEMAMix

[QA] The AdEMAMix Optimizer: Better, Faster, Older

The AdEMAMix Optimizer:

The AdEMAMix Optimizer: Better, Faster, Older

[QA] Planning In

[QA] Planning In Natural Language Improves LLM Search For Code Generation

Planning In Natural

Planning In Natural Language Improves LLM Search For Code Generation

MMMU-Pro: A More

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

[QA] MMMU-Pro: A

[QA] MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Sample what you

Sample what you can't compress

[QA] Sample what

[QA] Sample what you can't compress

[QA] CONTEXTCITE: Attributing

[QA] CONTEXTCITE: Attributing Model Generation to Context

CONTEXTCITE: Attributing Model

CONTEXTCITE: Attributing Model Generation to Context

FLUX that Plays

FLUX that Plays Music

[QA] FLUX that

[QA] FLUX that Plays Music

Modularity in Transformers:

Modularity in Transformers: Investigating Neuron Separability & Specialization

Smaller, Weaker, Yet

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

[QA] Smaller, Weaker,

[QA] Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Dolphin: Long Context

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

[QA] Dolphin: Long

[QA] Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

CycleGAN with Better

CycleGAN with Better Cycles