Azure Speech-to-Text
Available microsoft-azure/speech-stt · by Microsoft · end-to-end-asr
Pricing — 1 offering(s)
Microsoft Azure
Standard transcription
- $0.017 / minute Current 2017-06-01 → present
Showing the active price and any recorded history. Full pricing history is available via the paid API — see API docs.
Capability profile
transcription accuracy strong
language support strong
speaker diarisation strong
real time streaming strong
custom vocabulary strong
Benchmarks
| Benchmark | Score | Config | Source |
|---|---|---|---|
| WER (published, Microsoft Word Error Rate report) | — | Microsoft publishes periodic WER comparisons in their Azure blog. Numbers are vendor-reported. No independent third-party comparison on LibriSpeech available. | — |
Operator guidance
Best for enterprise teams on Azure needing compliance, custom model training, and Microsoft ecosystem integration (Teams, Dynamics). At $1.00/hour ($0.0167/min) it is the most expensive major STT option but provides the strongest enterprise governance story. For cost, Universal-2 ($0.15/hr) is 6× cheaper. For streaming latency, Deepgram Nova-3 is faster.
Use cases
- Enterprise transcription within Azure-native infrastructure
- Compliance-sensitive industries (healthcare, finance) requiring HIPAA/SOC 2
- Meeting and call centre transcription with speaker diarisation
- Custom Speech model training for domain-specific terminology
Limitations
- Most expensive major STT API at $1.00/hour PAYG
- Custom Speech training requires audio dataset preparation — higher onboarding cost
- Speech SDK dependency for streaming; REST API for batch only