Just now, Tencent's latest world model open source! Build a 3D world in one sentence, compatible with game engines

Today, Tencent officiallyReleased and open-sourced Hybrid 3Dworld model2.0 (HY-World 2.0).. As amultimodalworld models, HY-World 2.0 supports theText, images and videoetc. can be entered in the form ofAutomatically generated, reconstructed and simulatedComplete 3D world.

For the gaming industry, HY-World 2.0 supports the direct export ofSecondary editable assets such as mesh, 3DGS or point cloudsIt can be seamlessly imported into Unity, UE and other engines for quickly building game maps and level prototypes.

Compared to the previous HY-World 1.5, which could only generate one-minute videos, HY-World 2.0 not only supportsRoamable 3D spaceIt also generates a completeCharacter, building and scenery assets, realizing usability and playability.

▲ Input “Generate a cozy picture book style cabin”

Generating a 3D world in one sentence is no longer a problem, and Tencent Mixed 3D has also added thecharacter mode, the user can operate the character to explore freely in the streets, buildings and scenes.Physical collision effects. Just like in a game, the game character can freely walk through the generated 3D scene.

刚刚，腾讯最新世界模型开源! 一句话造出3D世界，兼容游戏引擎

▲Character mode allows the user to operate the character to explore freely

At the same time, HY-World 2.0 was launched in theScene integrity(sides and backs of objects) and toEnter the degree of compliance of the imageIt performs better and is equally suitable forEmbodied Intelligence Simulationand other scenes.

In response, we experienced it to see how it works.

Online Experience:https://3d.hunyuan.tencent.com/sceneTo3D

Open source code:https://github.com/Tencent-Hunyuan/HY-World-2.0

Technical report:https://3d-models.hunyuan.tencent.com/world/world2_0/HY_World_2_0.pdf

First, the original God, Resident Evil dual-scene reproduction, the role of free roaming full sense of reality!

First of all, I have a preliminary experience of the text and graphic scenes of this function, the operation is very simple, enter the prompt word or picture, click on the “immediate generation” can be.

Cue word: “Generate a proto-God style sky garden labyrinth containing platforms of varying heights, winding staircases, bridges suspended by vines, sunlight pouring through stained glass into the garden, a fountain and bridge in the center, and a sense of fantasy throughout the space.”

As you can see, both the representation of the depth of the scene and the details such as stairs, bridges and stained glass are well reproduced. Remarkably, my selected character was also free to roam around the generated 3D world.

Characters in areas such as stairs and bridges, all with a physical collision feel and moving form thatWalking up or down is natural and smooththat can test the spatial structure.

However, the character is only able to move within a limited range due to the small size of the movable area of the scene. When I chose to resize my character, I was able to observe the scene in more detail from a third-person character perspective.

Immediately following this, we attempted to use the image as a reference, and the generated scene remained largely consistent overall.

However, the image quality and detail performance is close to the text generation result, not fine and texture enough, which may be related to the display and rendering resolution on the web side.

With this in mind, we then tried video and multi-view image input.

For the video reference section, I chose a live video from Resident Evil where the main character walks straight down the street.

[Video]

▲Resident Evil's live-action video

As can be seen.The model captures the character's movement, as well as the scenery on both sides of the street, and passing pedestrians are also rendered, but the overall restoration of the 3D world is still incomplete.

In comparison.Multi-view image test performs betterThe modeling of the building's exterior and tier structure is very impressive. I directly used the 32 sheets of three-story roof building material that came with the model, and the model replicates the building's appearance and hierarchical structure amazingly well.

▲Multi-view image material

As you can see, the details and layers of the building are well preserved and the sense of wholeness is evident.

Second, sketches, text, video can make the world, end-to-end generation of 360 ° panorama

In HY-World 2.0.Enter a sketch, a piece of text or a videoAll can quickly generate coherent 3D worlds.

The technical point of realizing this function is that HY-World 2.0Unified spatial understanding, generation and reconstruction with 3D as the main axis, automatically transforming complex semantics and structures into complete spaces.

With the newly upgradedHY-Pano-2.0 end-to-end implicit learning programThe model can also generate 360 degree panoramic mapping from normal pictures or videos without any camera parameters.

The Hybrid team has also passedHybrid training with real panoramic photos and UE synthesized data, ensuring generation quality and generalization ability.

C. Intelligent path planning, allowing the character to roam freely

After generating the panorama, character path planning is also a major challenge. Model CombinationSelf-developed Spatial Agent Technology and Navmesh CharacterizationThe realization of theIntelligent planning of character roaming paths.

Depending on the semantics of different scenarios, the model can be planned to includeSurrounding objects, maximum roamingFive types of mirror trajectories within ensure coverage of key areas in the scene while avoiding through walls or runaways.

With the help of planned trajectories and world extensions, the character is able to roam naturally in the generated 3D scene with smooth and spatially logical paths.

IV. Generation of new perspectives to ensure spatial articulation and picture coherence

When expanding the scene, how does the model ensure that the newly generated area is geometrically and visually connected to the original space without “blowing through” it?

Its core innovations includePrecise camera control,Fine-grained visual detail retentionas well asSpatially coherent memory mechanisms.

Combining the design of memory mechanisms and systematic intermediate and post training, the Hybrid team has created theThe industry's strongest HY-WorldStereo New View Generation (NVS) model to date.

The generated images follow the input camera accurately, and the generated results of multiple mirrors are spatially consistent and conflict-free, andpost-training algorithmThe ability to quickly expand to new areas while ensuring that the quality of the picture does not deteriorate.

Eventually, all generated fragments are passed through theHY-WorldMirror 2.0Integration into a unified, interactive 3D world.

With customized Depth Alignment and adaptive Mask Gaussian optimization algorithms, the generated scene is represented by 3D Gaussian Splash (3DGS), and at the same time, high-quality mesh can be exported and directly and seamlessly imported into Unity, UE and other mainstream game engines for secondary editing and creation.

Conclusion: AI builds the world, one step further

From HY-World 1.0, the first open source 3D world model, to HY-World 1.5, which allows real-time online interaction, to the release of HY-World 2.0, this series of iterations has further brought AI closer to the ground in game development, virtual simulation and other industries.

Compared to the past when only short videos or static models could be generated, HY-World 2.0 provides a truly roamable, interactive, and secondary editable 3D world, significantly lowering the threshold for map prototyping and level design.

With the progress of domestic and international teams such as Fei-Fei Li World Labs open source Spark 2.0 renderer, AI world modeling is moving from proof of concept to industrial application, with great potential for future application in scenarios such as gaming, cultural preservation, urban planning, and interior design.

artifact # World Model

The copyright of the article belongs to the author, please do not reprint without permission.

Ali Releases Qwen-Image-2.0: The Dawn of a New Era in Image Generation

artifact # Ali

OpenAI Releases AI Browser ChatGPT Atlas, Challenging Chrome Supremacy

artifact # OpenAI

阿里开源首个图像生成基础模型Qwen-Image，支持中文高保真输出，登顶全球开源榜首

Ali open source the first image generation base model Qwen-Image, support for Chinese high-fidelity output, topped the global open source list

artifact # Qwen-Image

Tencent ima 2.0 released, one sentence search millions of knowledge bases, 200 million pieces of knowledge

artifact # Tencent

No comments

No comments...

Just now, Tencent's latest world model open source! Build a 3D world in one sentence, compatible with game engines

First, the original God, Resident Evil dual-scene reproduction, the role of free roaming full sense of reality!

Second, sketches, text, video can make the world, end-to-end generation of 360 ° panorama

C. Intelligent path planning, allowing the character to roam freely

IV. Generation of new perspectives to ensure spatial articulation and picture coherence

Conclusion: AI builds the world, one step further

NVIDIA launches world's first open-source quantum AI model to help develop quantum chips

Claude Opus 4.7 Late Night Blast! Competent for longer tasks, autonomous checking, and pulling full visual capacity

Related posts

Ali Releases Qwen-Image-2.0: The Dawn of a New Era in Image Generation

OpenAI Releases AI Browser ChatGPT Atlas, Challenging Chrome Supremacy

Ali open source the first image generation base model Qwen-Image, support for Chinese high-fidelity output, topped the global open source list

Tencent ima 2.0 released, one sentence search millions of knowledge bases, 200 million pieces of knowledge

No comments

Popular Articles

Popular Sites