Kling AI Unveils ‘O1’ Unified Multimodal Model and ‘Video 2.6’ with Native Audio

Share
Share
Frame 1171276607

Kling AI Unveils ‘O1’ Unified Multimodal Model and ‘Video 2.6’ with Native Audio

Frame 1171276607

Generative tech pioneer Kling AI (China) has introduced its ‘O1’ series and ‘Video 2.6’ model, marking a significant leap toward unified multimedia production. The O1 model acts as a multimodal engine capable of interpreting text, images, and existing footage simultaneously to maintain character and object consistency. Complementing this, Video 2.6 introduces “Native Audio,” allowing the system to generate dialogue, music, and ambient sound effects synchronized with visual motion in a single workflow.

Launched on December 9, 2025, the update includes ‘Avatar 2.0’ and the ‘Element Library’, which remembers specific items and characters for multi-shot stability. The model supports human voices ranging from whispers to dramatic shouts, and environmental sounds like fire or shattering glass. This technology is aimed at reducing production costs for advertisers and influencers by providing high-fidelity, 10-second outputs that are ready for social media and commercial use.

The system’s ability to process up to ten reference images at once makes complex image editing accessible to non-professionals. Kling AI’s rapid release cycle—including the 2.5 Turbo mode—emphasizes its strategy to dominate the creative production supply chain. By offering end-to-end audio-visual generation, the company enables a faster, more cohesive creative engine for global digital content creators and e-commerce sellers. (China)

Not a Vitrina Member? Apply Now!

Vitrina tracks global Film & TV projects, partners, and deals—used to find vendors, financiers, commissioners, licensors, and licensees

Vitrina tracks global Film & TV projects, partners, and deals—used to find vendors, financiers, commissioners, licensors, and licensees

Not a Vitrina Member? Apply Now!

Similar Articles