
Background and purpose of the project
Racer's open source Kolors Kolors project aims to advance AI technology in the field of art creation and image generation by providing powerful image generation capabilities. The project is not only a contribution to the technology community, but also a bold push for creative freedom, demonstrating Racer's determination and strength in AI technology.
Project Features and Benefits
- Bilingual comprehension and generative skills::
- Kolors Kolors supports bilingual prompt words in English and Chinese, and carries the Generalized Language Model (GLM) as a text encoder, which is capable of understanding and generating both English and Chinese texts, providing creators with a wider creative space.
- In particular, the processing is optimized for Chinese cultural elements, which makes the generated images closer to Chinese cultural characteristics and meets the localization needs.
- Long text processing capability::
- Support for context lengths of up to 256 tokens allows creators to portray what's on their mind, whether it's a complex scene or a rich story, with precision.
- Massive data training::
- Trained on billions of text-image pairs, the model has a large knowledge base and is able to generate diverse and accurate images.
- High quality image generation::
- Focusing on improving the quality of generation of realistic portraits, artistic styles and complex scenes, the images generated are significantly improved in terms of clarity, detail richness and semantic accuracy.
- Optimization of Chinese cultural elements::
- Optimized for Chinese cultural elements in particular, natural landscapes with Chinese characteristics such as the Great Wall and ink landscape paintings, as well as scenes with Chinese cultural symbolism such as ancient streets and the image of the dragon, are accurately reproduced in the images.
- Chinese Text Generation::
- Can embed Chinese text in the generated image to add more expression to the image, supports the generation of Chinese fonts and calligraphy.
Technical Architecture and Realization
- model architecture::
- Cortu Kolors is based on the SDXL model architecture and incorporates ChatGLM256 technology to enhance bilingual comprehension and text generation.
- The U-Net structure is used as the backbone model and text encoding is performed through ChatGLM for text-to-image generation.
- Training Strategies::
- The training is divided into two phases: a conceptual learning phase and a quality improvement phase.
- The conceptual learning phase acquires comprehensive knowledge and concepts from large-scale text-image pairs.
- The quality improvement phase uses millions of pieces of high-quality data selected by machines + humans for training to improve image quality.
- Introducing a new noise scheduling method to optimize high-resolution image generation.
- The training is divided into two phases: a conceptual learning phase and a quality improvement phase.
- Data sets and assessments::
- Training was performed using both public datasets (e.g., LAION DataComp, JourneyDB) and proprietary datasets.
- A category-balanced benchmark dataset, KolorsPrompts, is proposed to guide the training and evaluation of Kolors.
Applications & Experiences
- AI image creation::
- Users can generate paintings in a variety of styles and with beautiful quality by entering creative text descriptions.
- Provide a variety of style templates for users to choose from, to meet different aesthetic needs.
- AI image customization::
- Users can upload their own photos and choose different art styles for image customization to generate personalized portraits.
- Interactive play::
- In the Racer App, Kolors also supports interactive play such as AI play reviews to increase user engagement and fun.
Open Source Information and Resources
- open source link::
- Code open source links:https://github.com/Kwai-Kolors/Kolors
- Model open source links:https://modelscope.cn/models/Kwai-Kolors/Kolors
- Link to technical report:https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf
- Experience the environment::
- Users can run and experience Kolors models by building ComfyUI environments on platforms such as the Magic Hitch Community.
As the open source image generation model project of Racer, Kolors Kolors excels in bilingual comprehension, long text processing, and high-quality image generation, providing powerful technical support for AI image creation and image customization. Its open source program and rich resources enable more creators and researchers to participate in this field and jointly promote the development and application of AI technology.
data statistics
Relevant Navigation

The unlimited duration movie generation model introduced by KunlunWanwei team breaks through the bottleneck of the existing video generation technology and realizes high-quality, high-consistency and high-fidelity video creation.

Ovis2
Alibaba's open source multimodal large language model with powerful visual understanding, OCR, video processing and reasoning capabilities, supporting multiple scale versions.

Gemma 3n
Google introduced a lightweight open source large language model , both high performance and easy to deploy , suitable for local development and multi-scenario applications .

AlphaDrive
Combining visual language modeling and reinforcement learning, the autopilot technology framework is equipped with powerful planning inference and multimodal planning capabilities to deal with complex and rare traffic scenarios.

GraphRAG
Microsoft's open-source retrieval-enhanced generative model based on knowledge graph and graph machine learning techniques is designed to improve the understanding and reasoning of large language models when working with private data.

Krillin AI
AI video subtitle translation and dubbing tool, supporting multi-language input and translation, providing one-stop solution from video acquisition to subtitle translation and dubbing.

DeepSeek-VL2
Developed by the DeepSeek team, it is an efficient visual language model based on a hybrid expert architecture with powerful multimodal understanding and processing capabilities.

OmAgent
Device-oriented open-source smart body framework designed to simplify the development of multimodal smart bodies and provide enhancements for various types of hardware devices.
No comments...