← All models

CogVideoX-5B

Available

thudm/cogvideox-5b · by Zhipu AI (THUDM) · Diffusion transformer (DiT)

Pricing — 1 offering(s)

fal.ai

View on fal.ai ↗

Text-to-Video

$0.20 / clip Current 2026-06-15 → present

Showing the active price and any recorded history. Full pricing history is available via the paid API — see API docs.

Capability profile

Prompt adherence moderate

temporal consistency moderate

motion quality moderate

native audio unknown

character consistency moderate

camera control weak

clip duration ceiling moderate

Resolution ceiling weak

Inference speed weak

Operator guidance

Choose CogVideoX-5B when open weights and fine-tuning rights are needed, or for budget-sensitive workflows (~$0.02/s effective rate via fal.ai). Quality is competitive with other open-source models (LTX Video, Wan) but below closed-source frontier models. Inference is slower than distilled alternatives.

Use cases

Open-weights video generation with commercial-use license (MIT)
Fine-tuning and LoRA customisation workflows (weights publicly available)
Budget video inference via fal.ai or Replicate

Limitations

8fps output lower than closed-source models (typically 24fps)
720p maximum resolution
Slower inference than distilled/consistency models
No native audio generation