Ali Releases Qwen-Image-2.0: The Dawn of a New Era in Image Generation

artifact3mos agoupdate AiFun
439 0

Bytes of image-generating models just posted less than half a day ago.Ngari prefecture in Tibet, Tibetan: Mnga' risThe new model is here too! Today, Ali released theQwen-Image 2.0, a new generation of image generation base modelThis model supports ultra-long instructions up to one thousand tokens, 2k resolution, and a lighter model architecture with a model size much smaller than Qwen-Image 2.0's 20B, leading to faster inference.

We were the first to comment on theAli Qwen-Image 2.0, Byte Seedream 5.0 Preview, and Google Nano Banana ProA side-by-side experience comparison of the three models reveals that Qwen-Image 2.0 does have an advantage in long command following and long text rendering, but is still slightly behind the Nano Banana Pro in terms of realism in image generation.

Qwen-Image 2.0 upgrades focus onrendering of text.. In the official case about the AB test below, the font, typography, and formatting of the text is determined by an888 tokens (containing nearly a thousand words in English and Chinese)of extra-long cue words precisely defined, and Qwen-Image 2.0 can do a good job of restoring them.

阿里发布Qwen-Image-2.0:图像生成新时代的到来

Qwen-Image 2.0 was also able to render the entire text of the Lanting Collection Preface in brushstroke characters, and made sure that the text and the screen were relatively harmonized, and that the text did not obscure the landscape scenery and characters of the screen. Looking closely at the text section, you can still find some rendering failures, but the percentage is already very low.

阿里发布Qwen-Image-2.0:图像生成新时代的到来

Qwen-Image 2.0 also supportsRendering tens of subgraphs at onceand maintain the consistency of the subjects in it. For example, the picture below is a comic strip generated by Qwen-Image 2.0 in one go, with a total of 24 frames, in which the characters and drawing styles are more coherent.

阿里发布Qwen-Image-2.0:图像生成新时代的到来

In response to the common AI graph generation“Greasiness.”problem, Qwen-Image 2.0 has also been optimized. Compared with the previous generation model, Qwen-Image 2.0's colors are not overly saturated, the view is more like a real shot, and the AI flavor is a bit lighter.

阿里发布Qwen-Image-2.0:图像生成新时代的到来

▲From left to right: original image, Qwen-Image-2512, Qwen-Image 2.0

Ali tested Qwen-Image 2.0 on AI Arena, an AI blind testing platform, and the data showed that Qwen-Image 2.0 ranked third and second in the text-to-map and map-to-map benchmarks, respectively, though it is still a few steps away from Google's Nano Banana Pro (pictured here in Gemini-3-Pro-Image-Preview). a certain gap. In addition, this model has not yet been compared to the newly released Seedream 5.0 Preview.

阿里发布Qwen-Image-2.0:图像生成新时代的到来

Thousand questions visual generation head Wu Chenfei talked about in the interview, Qwen-Image project 2025 May project only set up, last year in August released the first model, since then mainly around the birth of the map and editing two branches of the iterative model, and Qwen-Image 2.0 is the birth of the map and editing of the two capabilities are integrated into a single model.

阿里发布Qwen-Image-2.0:图像生成新时代的到来

At present, Qwen-Image 2.0 has been opened on AliCloud Hundred Refine to invite tests, and users can also experience the new model for free through Qwen Chat (chat.qwen.ai). Liu Wei, product manager of Qwen App, disclosed that this model will subsequently be online in Qwen App.

阿里发布Qwen-Image-2.0:图像生成新时代的到来

After the meeting, we also talked with Wu Chenfei and Xiong Shuitian, Senior Solution Architect of Qianqian Big Model.

When we asked about the future plans for the Qwen-Image series of models, Chenfei Wu claimed that if we use one word as the core of the Qwen-Image 2.0 upgrade, it would be“Infographic”In the coming year, the Qwen-Image team will continue to study the generation of complex “parent images” such as PPTs, multi-image posters, comics, and so on, to further reduce illusions and errors.

In addition, the team plans to build on the previously released hierarchical model and further enhance the model'sLayered editing capabilitiesThe goal is to make generative modeling trulyProductivity toolsAI Layers. With AI layering, designers can flexibly combine AI generation (e.g., Thousand Questions editing specific layers) with traditional means, or merge the expertise of different models to achieve a “divide and conquer” complex editing process.

I. Ali, Byte, Google three models against each other, Qwen-Image 2.0 text rendering ability is outstanding

For the super-long cue word task, we fine-tuned the official super-long cue word of Qwen-Image 2.0 by adjusting the position of some of the elements to see if Qwen-Image 2.0 could deliver the same quality of generated results.

Cue word content:

阿里发布Qwen-Image-2.0:图像生成新时代的到来

The generated results of Qwen-Image 2.0 are as follows. We can see that the model restores our requirements for image layout and font color, and the content is accurately rendered with basically no omissions.

Article source: Wisdom

© Copyright notes

Related posts

No comments

none
No comments...