filmov
tv
Efficiently Scaling and Deploying LLMs // Hanlin Tang // LLM's in Production Conference
Показать описание
// Abstract
Hanlin discusses the evolution of Large Language Models and the importance of efficient scaling and deployment. He emphasizes the benefits of a decentralized approach of many small specialized models over one giant AGI model controlled by a few companies. Hanlin explains the advantages of companies training their own custom models, such as data privacy concerns, and provides insights into when it is appropriate to build your own models and the available tooling for training and deployment.
// Bio
Hanlin is the CTO & Co-founder of MosaicML, an ML infrastructure startup that enables enterprises to easily train large-scale AI models in their secure environments. Hanlin was previously the Director of the Intel AI Lab, responsible for the research and deployment of deep learning models. He joined Intel from its acquisition of Nervana Systems. Hanlin has a Ph.D. from Harvard University and has published in leading journals and conferences such as NeurIPS, ICLR, ICML, Neuron, and PNAS.
Hanlin discusses the evolution of Large Language Models and the importance of efficient scaling and deployment. He emphasizes the benefits of a decentralized approach of many small specialized models over one giant AGI model controlled by a few companies. Hanlin explains the advantages of companies training their own custom models, such as data privacy concerns, and provides insights into when it is appropriate to build your own models and the available tooling for training and deployment.
// Bio
Hanlin is the CTO & Co-founder of MosaicML, an ML infrastructure startup that enables enterprises to easily train large-scale AI models in their secure environments. Hanlin was previously the Director of the Intel AI Lab, responsible for the research and deployment of deep learning models. He joined Intel from its acquisition of Nervana Systems. Hanlin has a Ph.D. from Harvard University and has published in leading journals and conferences such as NeurIPS, ICLR, ICML, Neuron, and PNAS.
Efficiently Scaling and Deploying LLMs // Hanlin Tang // LLM's in Production Conference
How Large Language Models Work
Enabling Cost-Efficient LLM Serving with Ray Serve
Fast LLM Serving with vLLM and PagedAttention
Run Your Own LLM Locally: LLaMa, Mistral & More
What’s so hard about training and deploying LLMs?
[LLM 101 Series] EFFICIENTLY SCALING TRANSFORMER INFERENCE
LLMOps: Deploying LLMs and Scaling using Modal, LangChain and Huggingface
WHY AND HOW OF SCALING LARGE LANGUAGE MODELS | NICHOLAS JOSEPH
A Survey of Techniques for Maximizing LLM Performance
Strategies for Efficient LLM Deployments in Any Cluster -Angel M De Miguel Meana & Francisco Cab...
OpenLLM: Fine-tune, Serve, Deploy, ANY LLMs with ease.
Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83
Easily Scale LLM-Based Copilots with NVIDIA and Anyscale
Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mist...
Deploy LLMs More Efficiently with vLLM and Neural Magic
Training Billions of Parameter LLMs with MosaicML
How ChatGPT Works Technically | ChatGPT Architecture
Scaling Up “Vibe Checks” for LLMs - Shreya Shankar | Stanford MLSys #97
LLMs in Production: Fine-Tuning, Scaling, and Evaluation
How to deploy LLMs (Large Language Models) as APIs using Hugging Face + AWS
Go Production: ⚡️ Super FAST LLM (API) Serving with vLLM !!!
QLORA: Efficient Finetuning of Quantized LLMs | Paper summary
Five Challenges of Deploying LLM Systems
Комментарии