⚠️ Reduced-capability demo. This Space runs trimmed-down models under free ZeroGPU limits (daily GPU quota per visitor, per-utterance GPU calls, no continuous pipeline). The full LocalLLM — a private, fully offline suite with llama.cpp Q6_K inference, an OpenAI-compatible gateway, a tool-using agent, a React web UI, and true streaming voice-to-voice translation (≤ 8 s lag) — runs on your own GPU with no quotas and no data leaving your machine:
👉 https://github.com/murai1998/LocalLLM · demo vs. full version — 1-page comparison with architecture diagrams

🎙️ LocalLLM — with Voice Interpreter, Chat, OCR, and STT transcribe

Speak naturally — when you pause, your words are transcribed, translated, and spoken back. Each utterance uses a few seconds of your free daily GPU quota. (The full local version streams continuously with no quota.)

From

Tone

Voice

{}

🗣 Transcript

🌐 Translation

Demo version · full local deployment instructions in the GitHub repo · models: google/gemma-4-12b-it + openai/whisper-large-v3-turbo + Piper TTS (CPU)