What it is
A voice-first AI Japanese teacher you run yourself. You speak into the browser and an AI "sensei" replies out loud in Japanese, adapting to your JLPT level. On top of free conversation it has structured JLPT lessons, spaced-repetition vocabulary review, and "content lessons" built from any pasted Japanese text.
How it works
- A single WebSocket (
/ws/session) carries the whole loop: the browser streams 16 kHz Int16 PCM from an AudioWorklet, server-side RMS-based voice-activity detection decides when an utterance ends, and the audio is sent to afaster-whisper(large-v3) service for transcription. - The transcript is fed to an OpenAI-compatible LLM — by default a local
llama.cppserver running Qwen3.5 japanese fine-tune, swappable to the MiniMax cloud API via an env var. The reply streams back token-by-token and is spoken via AivisSpeech (a VOICEVOX-compatible TTS engine). - The model emits a hidden
<meta>JSON block each turn; the backend parses it to auto-persist new vocabulary and correction patterns to Postgres, while furigana (via pykakasi), an English translation, reply suggestions, and coaching feedback are generated in parallel so they don't add to perceived latency. - Vocabulary review uses a hand-written SM-2 spaced-repetition scheduler; grammar progress and lesson sessions (full transcripts) are also stored in Postgres.
- Everything ships as a
docker-composestack — React/Vite/TypeScript frontend, FastAPI backend, the Whisper ASR service, AivisSpeech, and Postgres — with the LLM left external so it can reuse an existing GPU server.
Why it's interesting
The interesting part is the latency engineering around a local, fully
self-hosted voice loop: streaming token output, parallel background enrichment,
cross-chunk <think>-tag stripping for reasoning models, and a single LLM
contract that doubles as both the spoken reply and a structured data feed that
drives the learning system. It also supports shadowing and dictation modes that
deliberately hide or delay the teacher's text.
Status
Private hobby project, self-hosted.