CogVideoX-5B
Available thudm/cogvideox-5b · by Zhipu AI (THUDM) · Diffusion transformer (DiT)
Pricing — 1 offering(s)
fal.ai
View on fal.ai ↗Text-to-Video
- $0.20 / clip Current 2026-06-15 → present
Showing the active price and any recorded history. Full pricing history is available via the paid API — see API docs.
Capability profile
Prompt adherence moderate
temporal consistency moderate
motion quality moderate
native audio unknown
character consistency moderate
camera control weak
clip duration ceiling moderate
Resolution ceiling weak
Inference speed weak
Operator guidance
Choose CogVideoX-5B when open weights and fine-tuning rights are needed, or for budget-sensitive workflows (~$0.02/s effective rate via fal.ai). Quality is competitive with other open-source models (LTX Video, Wan) but below closed-source frontier models. Inference is slower than distilled alternatives.
Use cases
- Open-weights video generation with commercial-use license (MIT)
- Fine-tuning and LoRA customisation workflows (weights publicly available)
- Budget video inference via fal.ai or Replicate
Limitations
- 8fps output lower than closed-source models (typically 24fps)
- 720p maximum resolution
- Slower inference than distilled/consistency models
- No native audio generation