What it is
Companion is a locally-hosted AI assistant with voice input/output, image/vision support, and persistent long-term memory, presented through an avatar-based web UI. Instead of relying on a cloud provider, it wires a local llama.cpp model together with self-hosted speech and memory services so the whole conversation loop stays on the author's own hardware. The goal is a stateful companion with a persona rather than a stateless chatbot.
How it works
- Runs as a set of Docker Compose microservices on a private bridge network, fronted by an nginx service that does TLS termination with a self-signed SAN certificate — needed so browsers will grant microphone and webcam access over a LAN IP.
- A Node.js/TypeScript middleware (Express + Socket.IO) is the orchestrator: it streams chat over WebSocket, injects a system prompt, and calls a local llama.cpp server through the OpenAI SDK (OpenAI-compatible
/v1, vision-capable). - Speech is handled by separate Python FastAPI GPU services:
faster-whisper(medium, float16/CUDA) for transcription, OmniVoice for English TTS with zero-shot voice cloning. - A memory service combines a ChromaDB vector store (embeddings via
sentence-transformers/all-MiniLM-L6-v2) with a SQLite conversation log; the middleware retrieves relevant memories before each turn and stores summaries afterward. - The middleware also exposes a tool registry — Brave web search, weather, time, and avatar/emotion control — and the model returns structured JSON carrying both speech and an emotion state, with a periodic "heartbeat" for passive observation.
Why it's interesting
The entire inference and voice stack is private — no cloud LLM or TTS, with the only outbound call being an optional Brave web search. The design leans into a persistent persona: a memory-retrieval loop, emotion-tagged structured responses, and a heartbeat that lets the assistant act without being prompted. The dedicated nginx TLS layer is a neat, pragmatic detail — it exists purely to satisfy browser secure-context rules for mic and webcam on a bare LAN IP.
Status
Active WIP — a private, self-hosted hobby project that runs on the author's home GPU box; not deployed or exposed publicly.