Revision as of 19:43, 4 January 2026 (view source) Allennova (talk \| contribs) (→‎*arr) ← Older edit		Revision as of 19:51, 4 January 2026 (view source) Allennova (talk \| contribs) m (Specified kobold as used backend) Newer edit →
Small language models are to be hosted on the two separate GPU's via docker to allow different models running behind the reverse proxy. A user on LUG's Sillytavern (or any front end where an API can be called on) can then select what model to prompt. Koboldcpp is currently used to run .GGUF models on mixed CPU and GPU allowing for language models (LLM), embedding models (used for data vectorization), text-to-speech (TTS), speech to text, and image generation under one application. ~~(Allen thing)~~

Docs/Maho: Difference between revisions

Docs/Maho (view source)