Sep 04, 2025 19:00:00

'HunyuanWorld-Voyager' can generate videos in which the viewpoint moves within a 3D scene generated from a single image

Tencent, a major Chinese IT company, has released ' HunyuanWorld-Voyager, ' an AI framework that generates coherent 3D scenes from a single image, on GitHub. HunyuanWorld-Voyager achieves scene augmentation while preserving context, and can generate videos of moving viewpoints within the generated 3D scene.

GitHub - Tencent-Hunyuan/HunyuanWorld-Voyager: Voyager is an interactive RGBD video generation model conditioned on camera trajectory, and supports real-time 3D reconstruction.

https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager

HunyuanWorld-Voyager is a 3D scene generation AI framework trained on a dataset of over 100,000 video clips, combining real-world captured images with synthetically rendered images in Unreal Engine, using a reconstruction pipeline that automates camera pose estimation and metric depth prediction for any video.

HunyuanWorld-Voyager consists of two main components:

1: A unified architecture that generates RGB and depth-aligned video sequences based on input images, ensuring consistency.
2: Autoregressive inference with smooth video sampling for efficient world caching and point removal, as well as iterative scene augmentation with context-aware consistency.

These components enable HunyuanWorld-Voyager to generate a coherent 3D scene from a single image, generate video of the scene as the camera moves, and reconstruct a 3D point cloud from the generated 3D scene.

On GitHub, the actual images input to HunyuanWorld-Voyager and the video generated based on them are publicly available. Below is the image input to HunyuanWorld-Voyager, and the image on the bottom right shows the camera movement within the 3D scene. The camera movement can be specified by the user.