Self-Hosted LLM Agent on Your Own Laptop or Edge Device - Michael Yuan, Second State

preview_player
Показать описание

Self-Hosted LLM Agent on Your Own Laptop or Edge Device | 在自己的笔记本电脑或边缘设备上自托管LLM Agent - Michael Yuan, Second State

As LLM applications evolve from chatbots to copilots to AI agents, there are increasing needs for privacy, customization, cost control, and value alignment. Running open-source LLMs and agents on personal or private devices is a great way to achieve those goals. With the release of a new generation of open-source LLMs, such as Llama 3, the gap between open-source and proprietary LLMs is narrowing fast. In many cases, open source LLMs are already outperforming SaaS-based proprietary LLMs. For AI agents, open-source LLMs are not just cheaper and more private. They allow customization through finetuning and RAG prompt engineering using private data. This talk shows you how to build a complete AI agent service using an open-source LLM and a personal knowledge base. We will use the open-source WasmEdge + Rust stack for LLM inference, which is fast and lightweight without complex Python dependencies. It is cross-platform and achieves native performance on any OSes, CPUs, and GPUs.

随着LLM应用程序从聊天机器人发展到副驾驶员再到AI代理,对隐私、定制、成本控制和价值对齐的需求越来越大。在个人或私人设备上运行开源LLMs和代理是实现这些目标的好方法。 随着新一代开源LLMs(如Llama 3)的发布,开源和专有LLMs之间的差距迅速缩小。在许多情况下,开源LLMs已经超越了基于SaaS的专有LLMs。对于AI代理来说,开源LLMs不仅更便宜、更私密,还允许通过微调和使用私人数据进行RAG提示工程来进行定制。 本次演讲将向您展示如何使用开源LLM和个人知识库构建完整的AI代理服务。我们将使用开源的WasmEdge + Rust堆栈进行LLM推理,这种方法快速轻便,不需要复杂的Python依赖。它是跨平台的,在任何操作系统、CPU和GPU上都能实现原生性能。
Рекомендации по теме