Question 1

Does Echosy work offline?

Accepted Answer

Yes, Echosy works 100% offline. All audio transcription happens on your Mac using local AI models. No internet connection is required for recording, transcribing, or dictation.

Question 2

Is my audio data private?

Accepted Answer

Absolutely. Your audio never leaves your Mac. Echosy uses on-device AI models (Whisper, Qwen3-ASR) that run locally on your GPU. No audio data is sent to any external server.

Question 3

What languages does Echosy support?

Accepted Answer

Echosy supports 24+ transcription languages including English, Chinese (Mandarin, Cantonese, Hokkien), Japanese, Korean, Spanish, French, German, and more. The app interface is available in English, Simplified Chinese, Traditional Chinese, Japanese, and Korean.

Question 4

Can Echosy record system audio on macOS?

Accepted Answer

Yes, Echosy captures system audio from any application using macOS ScreenCaptureKit, plus your microphone simultaneously. Perfect for recording meetings, lectures, podcasts, and more.

Question 5

Is Echosy free?

Accepted Answer

Yes, Echosy has a free tier that includes up to 15-minute recordings, real-time transcription, dictation, and 3 AI summaries per day. Pro is a one-time $49.80 purchase — no subscription — and unlocks 4-hour recordings, all 10 ASR models, unlimited summaries, file transcription, and advanced export formats (SRT, VTT, DOCX, PDF).

Question 6

Can I use Echosy to transcribe meetings?

Accepted Answer

Yes. Echosy captures system audio from any meeting app (Zoom, Teams, Google Meet, etc.) alongside your microphone. Transcripts appear in real time with timestamps, and you can generate an AI summary when the meeting ends — all processed locally on your Mac.

Question 7

Does Echosy work on Intel Macs?

Accepted Answer

Echosy 2.2 and later require Apple Silicon (M1 or newer) for GPU-accelerated on-device models (Qwen3-ASR, MLX Whisper). Intel Macs can use the legacy v2.1.0 release, which remains available for download but no longer receives feature updates.

Question 8

How is Echosy different from MacWhisper or Whisper Transcription?

Accepted Answer

Echosy goes beyond file transcription: it captures live system audio and microphone simultaneously, provides system-wide dictation that pastes text at your cursor in any app, streams real-time captions during recording, and includes an AI chat feature grounded in your transcript. It also supports 10 different ASR models across three backends.

Question 9

Can Echosy transcribe audio and video files?

Accepted Answer

Yes (Pro feature). Drag and drop any audio or video file — WAV, MP3, M4A, MP4, MOV, and 20+ other formats — and Echosy transcribes it locally using the same on-device AI models. No upload, no waiting for a cloud service.

Parameter	Default	Range	Description
Silence duration	800 ms	200–3000 ms	How long silence must last before ending a segment. Lower values create more, shorter segments. Higher values allow for natural pauses within a segment.
Min speech duration	300 ms	100–2000 ms	Minimum speech length to count as a segment. Filters out very short sounds like coughs, clicks, or background noise.
Max speech duration	20 s	5–120 s	Maximum segment length before forcing a split. Prevents very long segments that are harder for ASR models to transcribe accurately.
Speech ratio	2.5×	1.0–10.0	How much louder speech must be compared to background noise to be detected. Lower values are more sensitive (detect quieter speech), higher values require louder speech.

Parameter	Default	Description
Silence duration	600 ms	Shorter than recording — faster segment turnover for real-time typing.
Min speech duration	200 ms	Lower threshold to catch short words and quick dictation.
Max speech duration	15 s	Shorter max to keep segments manageable for quick input.
Speech ratio	2.5×	Same sensitivity as recording mode.

Preset	Silence	Min Speech	Max Speech	Best for
Meeting	800 ms	300 ms	20 s	Group discussions with natural pauses
Subtitle	500 ms	200 ms	10 s	Short, timed segments for subtitles
Interview	1000 ms	400 ms	30 s	Longer turns with clear speaker pauses
Lecture	1200 ms	500 ms	60 s	Continuous speech with few interruptions

Voice Activity Detection (VAD)

How VAD Works

Recording VAD Parameters

Dictation VAD Parameters

Presets

Tuning Tips

Ready to get started?