Ali Releases Qwen-Image-2.0: The Dawn of a New Era in Image Generation
Bytes of image-generating models just posted less than half a day ago.Ngari prefecture in Tibet, Tibetan: Mnga' risThe new model is here too! Today, Ali released theQwen-Image 2.0, a new generation of image generation base modelThis model supports ultra-long instructions up to one thousand tokens, 2k resolution, and a lighter model architecture with a model size much smaller than Qwen-Image 2.0's 20B, leading to faster inference.
We were the first to comment on theAli Qwen-Image 2.0, Byte Seedream 5.0 Preview, and Google Nano Banana ProA side-by-side experience comparison of the three models reveals that Qwen-Image 2.0 does have an advantage in long command following and long text rendering, but is still slightly behind the Nano Banana Pro in terms of realism in image generation.
Qwen-Image 2.0 upgrades focus onrendering of text.. In the official case about the AB test below, the font, typography, and formatting of the text is determined by an888 tokens (containing nearly a thousand words in English and Chinese)of extra-long cue words precisely defined, and Qwen-Image 2.0 can do a good job of restoring them.

Qwen-Image 2.0 was also able to render the entire text of the Lanting Collection Preface in brushstroke characters, and made sure that the text and the screen were relatively harmonized, and that the text did not obscure the landscape scenery and characters of the screen. Looking closely at the text section, you can still find some rendering failures, but the percentage is already very low.

Qwen-Image 2.0 also supportsRendering tens of subgraphs at onceand maintain the consistency of the subjects in it. For example, the picture below is a comic strip generated by Qwen-Image 2.0 in one go, with a total of 24 frames, in which the characters and drawing styles are more coherent.

In response to the common AI graph generation“Greasiness.”problem, Qwen-Image 2.0 has also been optimized. Compared with the previous generation model, Qwen-Image 2.0's colors are not overly saturated, the view is more like a real shot, and the AI flavor is a bit lighter.

▲From left to right: original image, Qwen-Image-2512, Qwen-Image 2.0
Ali tested Qwen-Image 2.0 on AI Arena, an AI blind testing platform, and the data showed that Qwen-Image 2.0 ranked third and second in the text-to-map and map-to-map benchmarks, respectively, though it is still a few steps away from Google's Nano Banana Pro (pictured here in Gemini-3-Pro-Image-Preview). a certain gap. In addition, this model has not yet been compared to the newly released Seedream 5.0 Preview.

Thousand questions visual generation head Wu Chenfei talked about in the interview, Qwen-Image project 2025 May project only set up, last year in August released the first model, since then mainly around the birth of the map and editing two branches of the iterative model, and Qwen-Image 2.0 is the birth of the map and editing of the two capabilities are integrated into a single model.

At present, Qwen-Image 2.0 has been opened on AliCloud Hundred Refine to invite tests, and users can also experience the new model for free through Qwen Chat (chat.qwen.ai). Liu Wei, product manager of Qwen App, disclosed that this model will subsequently be online in Qwen App.

After the meeting, we also talked with Wu Chenfei and Xiong Shuitian, Senior Solution Architect of Qianqian Big Model.
When we asked about the future plans for the Qwen-Image series of models, Chenfei Wu claimed that if we use one word as the core of the Qwen-Image 2.0 upgrade, it would be“Infographic”In the coming year, the Qwen-Image team will continue to study the generation of complex “parent images” such as PPTs, multi-image posters, comics, and so on, to further reduce illusions and errors.
In addition, the team plans to build on the previously released hierarchical model and further enhance the model'sLayered editing capabilitiesThe goal is to make generative modeling trulyProductivity toolsAI Layers. With AI layering, designers can flexibly combine AI generation (e.g., Thousand Questions editing specific layers) with traditional means, or merge the expertise of different models to achieve a “divide and conquer” complex editing process.
I. Ali, Byte, Google three models against each other, Qwen-Image 2.0 text rendering ability is outstanding
For the super-long cue word task, we fine-tuned the official super-long cue word of Qwen-Image 2.0 by adjusting the position of some of the elements to see if Qwen-Image 2.0 could deliver the same quality of generated results.
Cue word content:

The generated results of Qwen-Image 2.0 are as follows. We can see that the model restores our requirements for image layout and font color, and the content is accurately rendered with basically no omissions.
Article source: Wisdom
© Copyright notes
The copyright of the article belongs to the author, please do not reprint without permission.
Related posts
No comments...