← All models

CogVideoX-5B

Available

thudm/cogvideox-5b · by Zhipu AI (THUDM) · Diffusion transformer (DiT)

Pricing — 1 offering(s)

Text-to-Video

  • $0.20 / clip Current 2026-06-15 → present

Showing the active price and any recorded history. Full pricing history is available via the paid API — see API docs.

Capability profile

Prompt adherence moderate
temporal consistency moderate
motion quality moderate
native audio unknown
character consistency moderate
camera control weak
clip duration ceiling moderate
Resolution ceiling weak
Inference speed weak

Operator guidance

Choose CogVideoX-5B when open weights and fine-tuning rights are needed, or for budget-sensitive workflows (~$0.02/s effective rate via fal.ai). Quality is competitive with other open-source models (LTX Video, Wan) but below closed-source frontier models. Inference is slower than distilled alternatives.

Use cases

Limitations

Citations