← All models

AssemblyAI Universal-2

Available

assemblyai/universal-2 · by AssemblyAI · end-to-end-asr

Pricing — 1 offering(s)

AssemblyAI

Audio minutes

$0.0025 / minute Current 2024-06-01 → present

Showing the active price and any recorded history. Full pricing history is available via the paid API — see API docs.

Capability profile

transcription accuracy strong

language support moderate

speaker diarisation strong

real time streaming moderate

custom vocabulary strong

Benchmarks

Benchmark	Score	Config	Source
WER (LibriSpeech)	—	AssemblyAI does not publish LibriSpeech WER benchmarks. Positioned as competitive with Nova-3 on general audio; independent comparisons show strong accuracy on diverse accents.	—
Real-time streaming latency (vendor-reported)	500 ms	Approximate WebSocket latency. Higher than Deepgram Nova-3's ~300ms.	source ↗

Operator guidance

The best-value option for batch STT. At $0.0025/min ($0.15/hr) it is the lowest rate among major providers. Rich built-in features (diarization, chapters, PII redaction) reduce the need for post-processing. For streaming with the lowest latency, prefer Deepgram Nova-3. Note: pricing increases 10% from 2026-07-01 for in-region requests unless model_region=global is set.

Use cases

Batch transcription at the lowest per-minute rate among major APIs
Meeting transcription with speaker diarization and auto-chapters
Content moderation and sentiment analysis pipelines
Podcast and interview transcription with rich post-processing

Limitations

English-first; multilingual coverage is narrower than Whisper
Streaming latency (~500ms) is higher than Deepgram's offering
Some advanced features (PII redaction, sentiment) add to the per-minute cost