Все публикации

xLAM: A Family of Large Action Models to Empower AI Agent Systems

In Defense of RAG in the Era of Long-Context Language Models

[QA] Building Math Agents with Multi-Turn Iterative Preference Learning

Building Math Agents with Multi-Turn Iterative Preference Learning

Attention Heads of Large Language Models: A Survey

[QA] Attention Heads of Large Language Models: A Survey

[QA] The AdEMAMix Optimizer: Better, Faster, Older

The AdEMAMix Optimizer: Better, Faster, Older

[QA] Planning In Natural Language Improves LLM Search For Code Generation

Planning In Natural Language Improves LLM Search For Code Generation

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

[QA] MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Sample what you can't compress

[QA] Sample what you can't compress

[QA] CONTEXTCITE: Attributing Model Generation to Context

CONTEXTCITE: Attributing Model Generation to Context

FLUX that Plays Music

[QA] FLUX that Plays Music

Modularity in Transformers: Investigating Neuron Separability & Specialization

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

[QA] Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

[QA] Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

CycleGAN with Better Cycles