If you’re looking for a web UI and a simple way to host one yourself, nothing beats the “llama.cpp” project. They include a “llama-server” program which hosts a simple web server (with a chat webapp) and OpenAI-compatible API endpoint. It now also supports multimodality (for models that support multimodality), meaning you can for example upload an image and ask the assistant to describe the image. An example command to set up such a web server would be:
$ llama-server --threads 6 -m /path/to/model.gguf
Or, for multimodality support (like asking an AI to describe an image), use:
If you’re looking for a web UI and a simple way to host one yourself, nothing beats the “llama.cpp” project. They include a “llama-server” program which hosts a simple web server (with a chat webapp) and OpenAI-compatible API endpoint. It now also supports multimodality (for models that support multimodality), meaning you can for example upload an image and ask the assistant to describe the image. An example command to set up such a web server would be:
$ llama-server --threads 6 -m /path/to/model.gguf
Or, for multimodality support (like asking an AI to describe an image), use:
$ llama-server --threads 6 --mmproj /path/to/model/mmproj-F16.gguf -m /path/to/model/model.gguf