Using the Chat Endpoint in the Ollama API

preview_player
Показать описание

Рекомендации по теме
Комментарии
Автор

Thanks for this awesome tutorial. I took it as a reference and built a user_id based map to keep history in an in memory database.
This helped me to keep history for each user.
{
"user_1" : [{}],
"user_2": [{}]
}

nofoobar
Автор

Thank you sir for adding this QoL. Super helpful indeed!

Psychopatz
Автор

Thanks for writing this terrific server, models, and tools!

dr.mikeybee
Автор

That awakward silence at the end though :D

RanaMuhammadWaqas
Автор

For some reason, I’m unable to hit the endpoint from another computer on the same network

gears
Автор

I actually had no idea you were part of the ollama team too thats super cool

HistoryIsAbsurd
Автор

Hello 👋 can we connect our local ollama with a self hosted N8n server ?

GrecoFPV
Автор

Very interesting, I had no idea. What possible roles are there besides "user", do we just make them up or is there a predefined set?

chrisBruner
Автор

Is some way that you can help me with my proyect ? thanks from Chile Claudio

claudioguendelman
Автор

Hi Matt! Thanks from the awesome work! Is there a way to include vLlm into Ollama?

dextersantamaria
Автор

Is there an example of a chat UI application that uses the ollama inference endpoint and is then deployed in the cloud (AWS, GCP, etc)? I have managed to create the app and it is running on local, but I'm struggling to deploy it in cloud - specifically, I'm stuck at creating an appropriate Dockerfile, as it seems there needs to be two deployments, one for the ollama inference endpoint and one for the UI. Therefore, an example showing how it's done would be awesome!

PrashantSaikia
Автор

separate question. What's that browser? It has a very nice interface. TIA

ilteris
Автор

When I use Ollama from the terminal with the llama3 model, it works very fast, almost instantly. However, when I try to make a request to localhost from the same machine using curl, it is incredibly slow. Why could this be?

Ramirola
Автор

If ollama itself is not multithread and async, how much wrapper like this can help?

I have 2 laptops, A and B. 'A' has a database with more than 10k records in a table and code written Scala to fetch data from the database. 'B' has ollama running on it and llama3:latest model is loaded on it. If I am fetching data from the database from 'A' and sending it to chat and generate endpoints API of ollama on 'B'. I observed that ollama responds to the chat or generate request within few milliseconds like 100, 125, 200, 300. But that's just in the beginning. Later on the response time increase to 10 minutes and about for a request by the time 2k requests is send and still 8k requests still need to send.

From this behavious, it doesn't look that ollama is supporting concurrency. I created a python script with Flask to achieve async but the behaviour from ollama remained same. Do you know to solve this problem? Or is it like there no problem and hence no solution? OLLAMA_NUM_PARRAL value was 4.

niteshapte
Автор

Why does the chat api respond all at once, even with streaming turned on?

MavVRX
Автор

Do you have github repo for this video?

briannezhad
Автор

hey can you post a github link to the code?

sampriti