GPT 5.2, realtime video editor, AI stereo videos, mobile AI agents, full body control: AI NEWS

This week in AI has been packed with groundbreaking developments across various domains. OpenAI released GPT-5.2, their most capable model yet, excelling in professional knowledge work and outperforming expert humans in many real-world tasks. It boasts near-perfect accuracy even with extremely long context windows, making it ideal for handling large codebases and extensive documents. Meanwhile, Alibaba introduced “One Move,” an AI video tool that allows users to control object motion in videos by simply drawing trajectories, offering precise and physically accurate video manipulation that surpasses existing proprietary solutions.

In the realm of image and video editing, several impressive tools emerged. Snapchat’s EgoEdit enables real-time video editing through natural language prompts, allowing users to insert or replace objects seamlessly. Another notable tool, “Window Seat,” specializes in removing window reflections from photos with remarkable accuracy, outperforming previous methods. Additionally, “Stereo World” converts regular videos into 3D stereo videos with depth perception, enhancing visual experiences when viewed with 3D glasses. For animating characters, “One to All Animation” applies realistic body movements to any character from a reference photo, maintaining consistency even with unusual body proportions.

Open-source AI continues to advance rapidly. ZAI released Open AutoGM, an autonomous AI agent capable of operating smartphones to perform tasks like navigation, messaging, and shopping without human intervention. The company also unveiled GLM 4.6 Vision, a multimodal AI agent with vision capabilities that can analyze documents, images, and videos, making it a powerful tool for coding and data analysis. Meanwhile, Mistrol introduced Devstrol 2, an open-source coding model family that rivals state-of-the-art closed models in performance while being more efficient and accessible.

Speed and efficiency improvements are also notable this week. Twinflow, a new image generation technique, can produce high-quality images in just one diffusion step, drastically reducing generation time compared to traditional models. For anime art, the lightweight “Newbie Image Experimental 01” model offers excellent results with fewer parameters, making it accessible for users without high-end GPUs. Furthermore, “Quinn Image I2L” revolutionizes training LoRAs (fine-tuned image models) by enabling rapid training from just a few images, significantly lowering the barrier for custom image generation.

Finally, several innovative tools enhance video and 3D content creation. “Mocha” generates detailed 3D models from reference images and separates them into editable parts, facilitating animation and customization. “Light X” allows users to change camera movements and lighting in existing videos, applying realistic relighting and perspective shifts. Meta’s “One Story” generates multiple consistent video clips from prompts or reference images, enabling coherent long-form storytelling. Additionally, “Saber” excels at inserting reference images into videos with high consistency, outperforming competitors in maintaining character fidelity. These advancements collectively showcase the rapid evolution and expanding capabilities of AI across creative and professional fields.

Source link

Tagged agents, body, Control, editor, Full, GPT, mobile, NEWS, openai, realtime, stereo, video, videos

Leave a Reply Cancel reply