[QA] LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

preview_player
Показать описание
LazyLLM accelerates transformer-based language model inference by dynamically selecting essential tokens for KV cache computation, improving generation speed without fine-tuning while maintaining accuracy across various tasks.

Рекомендации по теме