31
edits
(→*arr) |
m (Specified kobold as used backend) |
||
|
Small language models are to be hosted on the two separate GPU's via docker to allow different models running behind the reverse proxy. A user on LUG's Sillytavern (or any front end where an API can be called on) can then select what model to prompt.
Koboldcpp is currently used to run .GGUF models on mixed CPU and GPU allowing for language models (LLM), embedding models (used for data vectorization), text-to-speech (TTS), speech to text, and image generation under one application.
| |||
edits