filmov
tv
Все публикации
0:56:04
vLLM Office Hours - vLLM’s 2024 Wrapped and 2025 Vision - December 19, 2024
0:58:50
[vLLM Office Hours] 2024 Highlights and 2025 Roadmap
0:44:31
vLLM Office Hours - Exploring Machete, a Mixed-Input GEMM Kernel for Hopper GPUs - December 5, 2024
0:48:06
vLLM Office Hours - Disaggregated Prefill and KV Cache Storage in vLLM - November 14, 2024
0:59:55
vLLM Office Hours - SOTA Tool-Calling Implementation in vLLM - November 7, 2024
0:49:38
vLLM Office Hours - Deep Dive into Mistral on vLLM - October 17, 2024
1:04:28
vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024
0:52:35
vLLM Office Hours - Advanced Techniques for Maximizing vLLM Performance - September 19, 2024
1:13:14
vLLM Office Hours - Using NVIDIA CUTLASS for High-Performance Inference - September 05, 2024
0:48:13
vLLM Office Hours - vLLM on AMD GPUs and Google TPUs - August 21, 2024
0:50:03
vLLM Office Hours - Multimodal Models in vLLM with Roblox - August 8, 2024
0:50:38
vLLM Office Hours - Model Quantization for Efficient vLLM Inference - July 25, 2024
0:33:21
Deploy LLMs More Efficiently with vLLM and Neural Magic
0:56:09
vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 2024
0:53:19
vLLM Office Hours - June 20, 2024
0:44:47
vLLM and Neural Magic Office Hours - June 5, 2024
0:06:31
Are MLOps disappearing?
0:01:06
5x Faster YOLOv8 on CPUs
0:47:52
Deploy Fast and Accurate YOLOv8 Object Detection Models on CPUs You Already Have
0:42:27
Unlock Faster and More Efficient LLMs with SparseGPT
0:52:31
Pruning and Quantizing ML Models With One Shot Without Retraining
0:08:15
Sparse Transferring Hugging Face Models With SparseML
0:41:42
Apply Second-Order Pruning Algorithms for SOTA Model Compression
0:06:53
Use Sparse Transfer Learning to Create Sparse Models Fine-Tuned to Your Datasets
Вперёд