Roadmap

Roadmap

0.0.1 — First release (done)

Default command, pipe mode, GPU, daemon mode.

Default command: voicsh alone starts mic recording (no subcommand)
Pipe mode: cat file.wav | voicsh → transcribe → stdout
Auto-resample WAV to 16kHz mono (linear interpolation)
GPU feature gates: --features cuda, vulkan, hipblas
Logging cleanup: all output respects -v/-vv levels consistently
Honest README matching actual features
Unix socket IPC: voicsh start / voicsh stop / voicsh toggle
Model stays in memory (~300MB for base.en)
Systemd user service: voicsh install-service
voicsh status shows daemon health
Shell completions (bash/zsh/fish)
voicsh init — auto-tune: benchmark hardware, recommend model, download
voicsh follow — stream daemon events (meter, state, transcriptions)
voicsh config get/set/list/dump — manage config without a text editor
Hallucination filtering (configurable, language-specific)
Fan-out mode — run English + multilingual models in parallel, pick best

0.1.0 — Voice commands (done)

Spoken punctuation, formatting, and keyboard control.

Punctuation: “period”, “comma”, “question mark”, “exclamation mark”, etc.
Whitespace: “new line”, “new paragraph”, “space”, “tab”
Caps toggle: “all caps” / “end caps”
Symbols: slash, ampersand, at-sign, dollar, hash, percent, asterisk, etc.
Brackets: open/close paren, brace, bracket, angle bracket
Key combos: “delete word” (Ctrl+Backspace), “backspace”
Configurable vocabulary in config.toml (user overrides merge with built-ins)
Rule-based (no LLM needed)
Multi-language: en, de, es, fr, pt, it, nl, pl, ru, ja, zh, ko
GNOME Shell extension (voicsh install-gnome-extension)

0.2.0 — Text refinement

LLM post-processing for polished output.

Local: Ollama, llama.cpp server (auto-detect running)
--refine default|formal|casual|code
Timeout + fallback to raw transcription
Cloud providers optional (Anthropic, OpenAI)

0.3.0 — GPU and extension testing

Container-based testing for GPU backends and GNOME integration.

GPU compilation gates: verify --features cuda/vulkan/hipblas build in vendor containers
- CUDA: nvidia/cuda:12.6.1-devel-ubuntu24.04 (compile-only, no GPU needed)
- hipblas: rocm/dev-ubuntu-24.04:6.4-complete (compile-only, no GPU needed)
- Vulkan: libvulkan-dev + mesa-vulkan-drivers on Ubuntu (compile and run)
Vulkan runtime tests via lavapipe (Mesa software Vulkan, VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.x86_64.json)
GNOME extension integration tests via ghcr.io/ddterm/gnome-shell-image/fedora-43 (or other distros)
- Headless: gnome-shell --wayland --headless --unsafe-mode --virtual-monitor 1600x960
- Install/enable via gnome-extensions CLI, check journalctl for errors
- Screenshots via D-Bus org.gnome.Shell.Screenshot
CI workflow for all of the above on GitHub Actions (standard runners, no GPU/KVM needed)

Future

Overlay feedback (Wayland layer-shell recording indicator)
Streaming word-by-word display
Push-to-talk (hold hotkey)
X11 support (xdotool/xsel)
Profiles (per-app settings)
Daemon: listen for PipeWire/PulseAudio device changes, auto-recover or show helpful message

Non-goals

GUI settings app (config file is enough)
Speaker identification
Real-time translation
Windows/macOS (Linux-first)