filmov
tv
[QA] LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Показать описание
LazyLLM accelerates transformer-based language model inference by dynamically selecting essential tokens for KV cache computation, improving generation speed without fine-tuning while maintaining accuracy across various tasks.