Cartesia Sonic
Available cartesia/sonic · by Cartesia · state-space-model
Pricing — 1 offering(s)
Cartesia
Text characters
- $65.00 / 1M characters Current 2024-04-01 → present
Showing the active price and any recorded history. Full pricing history is available via the paid API — see API docs.
Capability profile
voice naturalness moderate
language support moderate
voice variety moderate
streaming latency strong
cloning support moderate
Benchmarks
| Benchmark | Score | Config | Source |
|---|---|---|---|
| First-audio-byte latency (vendor-reported) | 50 ms | Vendor-reported sub-50ms target. Not independently verified against ElevenLabs Flash v2.5 (~75ms) under identical conditions. | source ↗ |
Operator guidance
The primary choice when streaming latency is the dominant constraint. At ~$65/1M chars it costs more than ElevenLabs Flash v2.5 (~$60/1M) for only a modest quality difference, but the sub-50ms latency floor is architecturally superior for real-time voice agents. For batch or low-latency-tolerant TTS, ElevenLabs or PlayHT offer better value. Verify current pricing at cartesia.ai/pricing.
Use cases
- Real-time conversational AI requiring sub-50ms voice response
- Interactive voice agents where latency is the primary constraint
- Low-latency TTS pipelines replacing transformers-based models
- Applications sensitive to streaming latency (voice assistants, phone bots)
Limitations
- English-first; limited multilingual coverage vs Azure or Google
- Smaller voice library than ElevenLabs or PlayHT
- Pricing has changed post-launch; verify current rate before use
- Early-stage company; pricing and feature stability less certain than established cloud providers