⚠️ Reduced-capability demo. This Space runs trimmed-down models under
free ZeroGPU limits (daily GPU quota per visitor, per-utterance GPU calls, no
continuous pipeline). The full LocalLLM — a private, fully offline
suite with llama.cpp Q6_K inference, an OpenAI-compatible gateway, a tool-using agent,
a React web UI, and true streaming voice-to-voice translation (≤ 8 s lag) — runs on
your own GPU with no quotas and no data leaving your machine:
👉 https://github.com/murai1998/LocalLLM · demo vs. full version — 1-page comparison with architecture diagrams
👉 https://github.com/murai1998/LocalLLM · demo vs. full version — 1-page comparison with architecture diagrams
🎙️ LocalLLM — with Voice Interpreter, Chat, OCR, and STT transcribe
Speak naturally — when you pause, your words are transcribed, translated, and spoken back. Each utterance uses a few seconds of your free daily GPU quota. (The full local version streams continuously with no quota.)
From
To
Tone
Voice
{}
Chat with the demo model, optionally about an image. (The full local version adds a tool-using agent with selectable skills, document attachments, and unlimited context — entirely offline.)
Upload or record audio for one-shot transcription and translation with a spoken result.
Language
To
Tone
Vision OCR for images and scanned PDFs (demo limit: 3 pages — the full local version is about 10x faster and handles whole documents and DOCX/PDF translation).
Translate to
Demo version · full local deployment instructions in the
GitHub repo ·
models: google/gemma-4-12b-it + openai/whisper-large-v3-turbo + Piper TTS (CPU)