btw the llama3 req timeout most of the time and is too slow. It worked once or twice only. When I run it in the terminal it works pretty fast and always does.
It worked again when I checked stream response. I had unchecked it and it stopped working. But It is still slow compared to terminal.
Did you use Ollama or how are you using it?
yes I used Ollama llama3 8b. Feel the speed is slow compared to using it directly from terminal
I see. BoltAI currently uses the OpenAI-compatible server from Ollama. Maybe that's why it's slower than querying the model directly.
I will do more benchmarking and maybe switch to direct connection in the future.
btw the llama3 req timeout most of the time and is too slow. It worked once or twice only. When I run it in the terminal it works pretty fast and always does.
It worked again when I checked stream response. I had unchecked it and it stopped working. But It is still slow compared to terminal.
Did you use Ollama or how are you using it?
yes I used Ollama llama3 8b. Feel the speed is slow compared to using it directly from terminal
I see. BoltAI currently uses the OpenAI-compatible server from Ollama. Maybe that's why it's slower than querying the model directly.
I will do more benchmarking and maybe switch to direct connection in the future.