filmov
tv
Speedrun deploying LLM Embedding models into Production
Показать описание
In this video we speedrun deploying an LLM embedding model from Huggingface into production on a GPU instance.
We spin up a Runpod GPU instance, expose the required ports, install an infinity server inference engine, and make API requests to the server to obtain text embeddings.
We spin up a Runpod GPU instance, expose the required ports, install an infinity server inference engine, and make API requests to the server to obtain text embeddings.