
TeleChat-12B is the Artificial Intelligence Research Institute of China TelecomOpen Sourceof a Star Semantic Megamodel, which has been significantly improved in terms of content, performance and application compared to the previous TeleChat-7B version.
open source process
- 2024.5.16 Open source optimized version 12B chat modelTeleChat-12B-V2
- 2024.3.20 Open source version 12B chat model and quantization version
- 2024.1.11 Open source 1T Chinese dataset
- 2024.1.10 Open source version 7B chat model and its quantized version
Model parameters and training data
- parameter scale: TeleChat-12B has 12 billion parameters, a significant increase in size compared to TeleChat-7B's 7 billion parameters.
- Training data: TeleChat-12B increases the amount of training data from 1.5T in version 7B to 3T, significantly improving the quality of the data and the performance of the model.
Model Structure and Optimization
- Decoupling the word embedding layer from the output layer: TeleChat-12B adopts the structure of decoupling the word embedding layer from the output layer, which separates the parameters of the word embedding layer and the output lm head layer, which helps to enhance the training stability and convergence.
- Model structure optimization: TeleChat-12B uses a small-scale model to experiment with combinations of multiple model structures to select the optimal structure, further optimizing the performance of the model.
Training Methods and Results Enhancement
- Scientific Data Matching Learning and Curriculum Learning: TeleChat-12B employs a scientific approach to data rationing learning and course learning during the training process, using a small parameter model to fit over multiple data ratios and dynamically boosting the weights of harder-to-learn datasets to ensure that the model has a better fit across all datasets.
- Effectiveness enhancement: Compared to TeleChat-7B, TeleChat-12B achieves an overall improvement of about 30% in terms of content understanding, performance performance and application scenarios, and especially improves more than 40% in terms of capabilities in multi-round dialog reasoning and security-related areas.
Application Scenarios and Effects
- multi-scenario application: TeleChat-12B has been applied to line writing, code programming, network fault analysis, and business analysis scenarios. For example, in line writing, the average number of words generated is more than 1,500, and the effective adoption rate is 85.7%.
- External services: In the external service of enterprise and public institution customers, TeleChat-12B can cover the actual business requirements of 95%, and the accuracy rate in multi-round dialog comprehension reaches 90%.
Localization Advancement
- Support Domestic Chips: TeleChat-12B supports int8, int4 quantization and domestic chip training inference, which further promotes the process of full-stack localization of large models.
- Cooperation and ecologyChina Telecom and Huawei SingTen have jointly promoted the localization of the full stack of large models, and have completed the commercialization of models based on SingTen technology in several projects.
TeleChat-12B has been comprehensively optimized and upgraded in terms of parameter scale, training data, model structure, and training methodology, which significantly improves the performance and effect of the model and demonstrates excellent capabilities in multiple application scenarios. At the same time, it also actively promotes the process of full-stack localization of large models, injecting new momentum into the development of the AI industry.
data statistics
Relevant Navigation

Google open-sourced a model that uses artificial intelligence technology to analyze camera trap photos to automatically identify animal species.

CogView4
The open-source text-to-graphics model released by Wisdom Spectrum AI supports bilingual input, generates high-quality images and is the first to generate Chinese characters in the screen, which is widely used in advertising, short videos, art creation and other fields.

QwQ-32B
Alibaba released a high-performance inference model with 32 billion parameters that excels in mathematics and programming for a wide range of application scenarios.

Wan2.1
Alibaba launched an efficient video generation model that can accurately simulate complex scenes and actions, support Chinese and English special effects, and lead a new era of AI video creation.

Confucius-o1
NetEaseYouDao launched the first 14B lightweight model in China that supports step-by-step reasoning and explanation, designed for educational scenarios, which can help students efficiently understand complex math problems.

Mistral 7B
A powerful large-scale language model with about 7.3 billion parameters, developed by Mistral.AI, demonstrates excellent multilingual processing power and reasoning performance.

Waver 1.0
Waver 1.0 is an open source full-featured video generation model that makes it easy to create text/images to HD video with efficiency, convenience and outstanding quality.

Phi-3
A high-performance large-scale language model from Microsoft, tuned with instructions to support cross-platform operation, with excellent language comprehension and reasoning capabilities, especially suitable for multimodal application scenarios.
No comments...
