
Background and purpose of the project
Racer's open source Kolors Kolors project aims to advance AI technology in the field of art creation and image generation by providing powerful image generation capabilities. The project is not only a contribution to the technology community, but also a bold push for creative freedom, demonstrating Racer's determination and strength in AI technology.
Project Features and Benefits
- Bilingual comprehension and generative skills::
- Kolors Kolors supports bilingual prompt words in English and Chinese, and carries the Generalized Language Model (GLM) as a text encoder, which is capable of understanding and generating both English and Chinese texts, providing creators with a wider creative space.
- In particular, the processing is optimized for Chinese cultural elements, which makes the generated images closer to Chinese cultural characteristics and meets the localization needs.
- Long text processing capability::
- Support for context lengths of up to 256 tokens allows creators to portray what's on their mind, whether it's a complex scene or a rich story, with precision.
- Massive data training::
- Trained on billions of text-image pairs, the model has a large knowledge base and is able to generate diverse and accurate images.
- High quality image generation::
- Focusing on improving the quality of generation of realistic portraits, artistic styles and complex scenes, the images generated are significantly improved in terms of clarity, detail richness and semantic accuracy.
- Optimization of Chinese cultural elements::
- Optimized for Chinese cultural elements in particular, natural landscapes with Chinese characteristics such as the Great Wall and ink landscape paintings, as well as scenes with Chinese cultural symbolism such as ancient streets and the image of the dragon, are accurately reproduced in the images.
- Chinese Text Generation::
- Can embed Chinese text in the generated image to add more expression to the image, supports the generation of Chinese fonts and calligraphy.
Technical Architecture and Realization
- model architecture::
- Cortu Kolors is based on the SDXL model architecture and incorporates ChatGLM256 technology to enhance bilingual comprehension and text generation.
- The U-Net structure is used as the backbone model and text encoding is performed through ChatGLM for text-to-image generation.
- Training Strategies::
- The training is divided into two phases: a conceptual learning phase and a quality improvement phase.
- The conceptual learning phase acquires comprehensive knowledge and concepts from large-scale text-image pairs.
- The quality improvement phase uses millions of pieces of high-quality data selected by machines + humans for training to improve image quality.
- Introducing a new noise scheduling method to optimize high-resolution image generation.
- The training is divided into two phases: a conceptual learning phase and a quality improvement phase.
- Data sets and assessments::
- Training was performed using both public datasets (e.g., LAION DataComp, JourneyDB) and proprietary datasets.
- A category-balanced benchmark dataset, KolorsPrompts, is proposed to guide the training and evaluation of Kolors.
Applications & Experiences
- AI image creation::
- Users can generate paintings in a variety of styles and with beautiful quality by entering creative text descriptions.
- Provide a variety of style templates for users to choose from, to meet different aesthetic needs.
- AI image customization::
- Users can upload their own photos and choose different art styles for image customization to generate personalized portraits.
- Interactive play::
- In the Racer App, Kolors also supports interactive play such as AI play reviews to increase user engagement and fun.
Open Source Information and Resources
- open source link::
- Code open source links:https://github.com/Kwai-Kolors/Kolors
- Model open source links:https://modelscope.cn/models/Kwai-Kolors/Kolors
- Link to technical report:https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf
- Experience the environment::
- Users can run and experience Kolors models by building ComfyUI environments on platforms such as the Magic Hitch Community.
As the open source image generation model project of Racer, Kolors Kolors excels in bilingual comprehension, long text processing, and high-quality image generation, providing powerful technical support for AI image creation and image customization. Its open source program and rich resources enable more creators and researchers to participate in this field and jointly promote the development and application of AI technology.
data statistics
Related Navigation

Alibaba released a high-performance inference model with 32 billion parameters that excels in mathematics and programming for a wide range of application scenarios.

DeepSeek-R1
The AI model, which is open-source under the MIT License, has advanced reasoning capabilities and supports model distillation. Its performance is benchmarked against OpenAI o1 official version and has performed well in multi task testing.

BabelDOC
Open source AI translation tool, supporting bilingual control, multi-engine translation, format preservation and batch processing, helping researchers read foreign literature efficiently.

BLOOM
A large open-source multilingual language model developed by over 1,000 researchers from more than 60 countries and 250 institutions, with 176B parameters and trained on the ROOTS corpus, supporting 46 natural languages and 13 programming languages, aims to advance the research and use of large-scale language models by academics and small companies.

Xiaomi MiMo
Xiaomi's open-sourced 7 billion parameter inference macromodel, which outperforms models such as OpenAI o1-mini in mathematical reasoning and code competitions by a small margin.

Shortest
An end-to-end testing framework based on natural language processing and AI technologies which streamlines the testing process, increases testing efficiency, and lowers the testing threshold.

OmniParser V2.0
Microsoft has introduced a Visual Agent parsing framework that transforms large language models into intelligences that can manipulate computers, enabling efficient automated interactions.

ChatTTS
An open source text-to-speech model optimized for conversational scenarios, capable of generating high-quality, natural and smooth conversational speech.
No comments...