Making Open Models 10x faster and better for Modern Application Innovation: Dmytro (Dima) Dzhulgakov

preview_player
Показать описание
Generative AI powers the next generation of real time applications. The key to success of modern application development in the Gen AI era is secure, latency-sensitive and low cost LLM serving solution, which Firework’s enterprise grade deployment provides. Fireworks AI accelerates innovation through its SaaS platform of low latency inference and high quality fine-tuning of 100+ models, across the state of the art LLMs, image/video/audio generation, embedding and multimodality models. These advantages are delivered through Fireworks' proprietary FireAttention technology, 4x-15x faster than the OSS alternatives. To bring the totality of knowledge together, Fireworks tuned their own FireFunction model to integrate hundreds of models and API calling together. Fireworks' adoption is the fastest in the industry and it also enables a software stack capable of extracting the most across different hardware and deployment options.

About Dmytro

Dmytro is one of PyTorch core maintainers. Previously he helped to bring PyTorch from a research framework to numerous production applications across Meta's AI use cases and broader industry.
Рекомендации по теме
Комментарии
Автор

if the inference is 10x faster requiring far less GPU compute then why is the base model API pricing more expensive than the competition?
you should be able to undercut everyone if your inference really is that much more efficient.

IvarDaigon