Roadmap
Roadmap
0.0.1 — First release (done)
Default command, pipe mode, GPU, daemon mode.
- Default command:
voicshalone starts mic recording (no subcommand) - Pipe mode:
cat file.wav | voicsh→ transcribe → stdout - Auto-resample WAV to 16kHz mono (linear interpolation)
- GPU feature gates:
--features cuda,vulkan,hipblas - Logging cleanup: all output respects -v/-vv levels consistently
- Honest README matching actual features
- Unix socket IPC:
voicsh start/voicsh stop/voicsh toggle - Model stays in memory (~300MB for base.en)
- Systemd user service:
voicsh install-service -
voicsh statusshows daemon health - Shell completions (bash/zsh/fish)
-
voicsh init— auto-tune: benchmark hardware, recommend model, download -
voicsh follow— stream daemon events (meter, state, transcriptions) -
voicsh config get/set/list/dump— manage config without a text editor - Hallucination filtering (configurable, language-specific)
- Fan-out mode — run English + multilingual models in parallel, pick best
0.1.0 — Voice commands (done)
Spoken punctuation, formatting, and keyboard control.
- Punctuation: “period”, “comma”, “question mark”, “exclamation mark”, etc.
- Whitespace: “new line”, “new paragraph”, “space”, “tab”
- Caps toggle: “all caps” / “end caps”
- Symbols: slash, ampersand, at-sign, dollar, hash, percent, asterisk, etc.
- Brackets: open/close paren, brace, bracket, angle bracket
- Key combos: “delete word” (Ctrl+Backspace), “backspace”
- Configurable vocabulary in config.toml (user overrides merge with built-ins)
- Rule-based (no LLM needed)
- Multi-language: en, de, es, fr, pt, it, nl, pl, ru, ja, zh, ko
- GNOME Shell extension (
voicsh install-gnome-extension)
0.2.0 — Text refinement
LLM post-processing for polished output.
- Local: Ollama, llama.cpp server (auto-detect running)
--refine default|formal|casual|code- Timeout + fallback to raw transcription
- Cloud providers optional (Anthropic, OpenAI)
0.3.0 — GPU and extension testing
Container-based testing for GPU backends and GNOME integration.
- GPU compilation gates: verify
--features cuda/vulkan/hipblasbuild in vendor containers- CUDA:
nvidia/cuda:12.6.1-devel-ubuntu24.04(compile-only, no GPU needed) - hipblas:
rocm/dev-ubuntu-24.04:6.4-complete(compile-only, no GPU needed) - Vulkan:
libvulkan-dev+mesa-vulkan-driverson Ubuntu (compile and run)
- CUDA:
- Vulkan runtime tests via lavapipe (Mesa software Vulkan,
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.x86_64.json) - GNOME extension integration tests via
ghcr.io/ddterm/gnome-shell-image/fedora-43(or other distros)- Headless:
gnome-shell --wayland --headless --unsafe-mode --virtual-monitor 1600x960 - Install/enable via
gnome-extensionsCLI, checkjournalctlfor errors - Screenshots via D-Bus
org.gnome.Shell.Screenshot
- Headless:
- CI workflow for all of the above on GitHub Actions (standard runners, no GPU/KVM needed)
Future
- Overlay feedback (Wayland layer-shell recording indicator)
- Streaming word-by-word display
- Push-to-talk (hold hotkey)
- X11 support (xdotool/xsel)
- Profiles (per-app settings)
- Daemon: listen for PipeWire/PulseAudio device changes, auto-recover or show helpful message
Non-goals
- GUI settings app (config file is enough)
- Speaker identification
- Real-time translation
- Windows/macOS (Linux-first)