xAI has officially released the Grok Imagine API, a high-performance text-to-video and image-to-video generation model capable of producing 720p resolution videos up to 15 seconds long, complete with native audio output.
The launch marks a major advancement in xAI’s multimodal AI capabilities and positions Grok Imagine as the current leader in independent video generation benchmarks, according to real-time rankings published by Artificial Analysis.
Benchmark Performance: The New #1
Grok Imagine achieved the #1 spot in both the Text-to-Video and Image-to-Video categories of the Artificial Analysis Video Arena. It surpassed industry leaders including Runway Gen-4.5 and Google’s Veo 3.1.
| Model | ELO Score | Native Audio | Price (per min) |
| Grok Imagine | 1,248 | Yes | $4.20 |
| Runway Gen-4.5 | 1,215 | No | Variable |
| Kling 2.5 Turbo | 1,190 | No | $4.20 |
| Google Veo 3.1 | 1,182 | Yes | $12.00 |
| OpenAI Sora 2 Pro | 1,175 | Yes | $30.00 |
Pricing & Accessibility
xAI has positioned Grok Imagine as a high-value alternative to existing professional models:
-
Cost: $4.20 per minute of generated video (including audio).
-
Competitive Positioning: Matches Kling 2.5 Turbo (without audio) and is significantly cheaper than Veo 3.1 Preview ($12/min) and Sora 2 Pro ($30/min).
Developers can immediately access the API through xAI’s developer console, with early integrations—including support in tools like ComfyUI—already emerging.
Development & Key Features
xAI engineers described the six-month sprint to version 1.0 as a focused effort to deliver high-quality, fast generation with strong prompt adherence.
Key Capabilities include:
-
Text-to-Video: Generation from detailed natural language prompts.
-
Image-to-Video: Bringing static sketches or photos to life with fluid animation.
-
Scene Restyling: Transforming existing footage into cyberpunk, vintage, or other specific aesthetics.
-
Native Audio Sync: Integrated generation for dialogue, sound effects, and background music.
The model leverages xAI’s infrastructure and training advancements, building on the Grok family’s emphasis on reasoning and real-time performance.
Market & Community Reaction
The launch generated significant excitement among developers, AI researchers, and content creators. Early users on X (formerly Twitter) highlighted Grok Imagine’s rapid ascent to the top of the Artificial Analysis leaderboard and its native audio feature as a major differentiator.
Some observers noted the competitive pricing could accelerate adoption in creative industries, education, marketing, and entertainment. xAI positioned the API as a foundational tool for developers building next-generation applications involving dynamic visual and audio content.
