AssemblyAI Universal-2
Available assemblyai/universal-2 · by AssemblyAI · end-to-end-asr
Pricing — 1 offering(s)
AssemblyAI
Audio minutes
- $0.0025 / minute Current 2024-06-01 → present
Showing the active price and any recorded history. Full pricing history is available via the paid API — see API docs.
Capability profile
transcription accuracy strong
language support moderate
speaker diarisation strong
real time streaming moderate
custom vocabulary strong
Benchmarks
| Benchmark | Score | Config | Source |
|---|---|---|---|
| WER (LibriSpeech) | — | AssemblyAI does not publish LibriSpeech WER benchmarks. Positioned as competitive with Nova-3 on general audio; independent comparisons show strong accuracy on diverse accents. | — |
| Real-time streaming latency (vendor-reported) | 500 ms | Approximate WebSocket latency. Higher than Deepgram Nova-3's ~300ms. | source ↗ |
Operator guidance
The best-value option for batch STT. At $0.0025/min ($0.15/hr) it is the lowest rate among major providers. Rich built-in features (diarization, chapters, PII redaction) reduce the need for post-processing. For streaming with the lowest latency, prefer Deepgram Nova-3. Note: pricing increases 10% from 2026-07-01 for in-region requests unless model_region=global is set.
Use cases
- Batch transcription at the lowest per-minute rate among major APIs
- Meeting transcription with speaker diarization and auto-chapters
- Content moderation and sentiment analysis pipelines
- Podcast and interview transcription with rich post-processing
Limitations
- English-first; multilingual coverage is narrower than Whisper
- Streaming latency (~500ms) is higher than Deepgram's offering
- Some advanced features (PII redaction, sentiment) add to the per-minute cost