Big performance gains! Ali open-sources a new version of the Qwen3 model to dominate textual representations

Newsflash2mos agoupdate AiFun
378 0

Early this morning, Alibaba open-sourced twoQwen3series of new models.Qwen3-Embeddingcap (a poem)Qwen3-Reranker.

These two models are designed for text characterization, retrieval and sorting tasks based on the Qwen3Basic model training, fully inherited fromQwen 3Advantages in multilingual text comprehension with support for119Languages.

According to the test data, in the benchmark test of multilingual textual representationQwen3 EmbeddingThe performance of the Among other things.8Bparameters are based on70.58The high score of No. 1 in the rankings surpasses many commercialAPIservices, such as Google'sGemini-Embedding.

性能大涨!阿里开源新版Qwen3模型,霸榜文本表征

In the ranking task, the Qwen3 Reranker family of models also demonstrated strong capabilities. In the basic relevance retrieval task, the 8B model achieved a high score of 69.02 in the multilingual retrieval task, reaching 77.45 in the Chinese retrieval task and 69.76 in the English retrieval task, which is significantly better than the other baseline models.

性能大涨!阿里开源新版Qwen3模型,霸榜文本表征

Open source address:
https://huggingface.co/collections/Qwen/qwen3-embedding-6841b2055b99c44d9a4c371f
https://huggingface.co/collections/Qwen/qwen3-reranker-6841b22d0192d7ade9cdefea

Text characterization and ranking are core tasks in natural language processing and information retrieval, which are mainly used in many fields such as web search, question and answer system, recommendation system, etc. High-quality text characterization enables the model to accurately capture the semantic relationships between texts, while an effective ranking mechanism ensures that the most relevant results are presented to the user first.

However, it is difficult to train models on large-scale data that have both generalization ability and accurate retrieval and sorting, and the new version of theQwen3Significantly ahead of other models.

In terms of model architecture, the two models use a model based on theQwen3A denser version of the base model with three model configurations for different parameter scales is provided, namely0.6B,4Bcap (a poem)8Bparameters to meet the performance and efficiency requirements in different scenarios.

For the text embedding model, the researchers used a large model of the causal attention mechanism and added the end of the input sequence to the[EOS]markers, thus extracting the semantic representation of the text from the hidden state of the last layer. This design not only enhances the model's ability to understand the semantics of the text, but also allows the model to be flexibly adapted to different task requirements.

性能大涨!阿里开源新版Qwen3模型,霸榜文本表征

In addition, to enable the model to better follow instructions and perform well in downstream tasks, the researchers spliced the instructions and query text into a single input context, while the document remained unchanged. This design allows the model to better understand and process complex semantic tasks, improving its performance in multilingual and cross-linguistic tasks.

For the sorting model, theA single-tower structure is used, which takes text pairs (e.g., user query and candidate documents) as inputs and transforms the similarity assessment task into a binary classification problem by means of a dialog template for the larger model. The model is able to determine whether a document matches the query or not based on the input commands, queries and documents, and outputs a relevance score. This design enables the model to evaluate the relevance between text pairs more accurately, leading to better results in the sorting task.

In terms of training paradigm, the family of models employs an innovative multi-stage training approach, including large-scale unsupervised pre-training, supervised fine-tuning with high-quality data, and model fusion strategies.

In the unsupervised pre-training phase, the researchers utilized theQwen3The text generation capabilities of the base model synthesize large-scale, weakly supervised training data. These data cover a wide range of task types, languages, and domains, providing the model with a wide range of learning material.

This approach to synthetic data not only improves data controllability, but also generates high-quality data in low-resource languages and domains, breaking through the limitations of traditional methods that rely on community forums or open-source data filtering to obtain weakly-supervised text pairs, and realizing efficient generation of large-scale weakly-supervised data.

性能大涨!阿里开源新版Qwen3模型,霸榜文本表征

In the supervised fine-tuning phase, the researchers selected high-quality small-scale labeled data for training to further improve the performance of the model. The training data in this phase not only includes open-source labeled datasets, e.g., MS MARCO, NQ, HotpotQA, etc., but also screens some synthetic data. Through simple cosine similarity calculation, high-quality data pairs are filtered from the synthetic data to further improve the performance of the model. This strategy not only improves the generalization ability of the model, but also achieves excellent results in a variety of benchmark tests.

Finally, in the model fusion stage, the researchers used a model fusion technique based on spherical linear interpolation. By merging multiple model checkpoints saved during the fine-tuning process, the model was able to exhibit better performance on different data distributions. This strategy significantly improves the stability and consistency of the model, and enhances the robustness and generalization ability of the model.

In addition to the technical innovation points mentioned above, both models have been carefully designed for training data synthesis. In order to generate high-quality synthetic data, the researchers used a well-designed cueing strategy. In the text retrieval task, the model generates data from a multilingual pre-trained corpus and assigns specific roles to each document in order to simulate a potential user's query for that document.

性能大涨!阿里开源新版Qwen3模型,霸榜文本表征

In addition, the prompts contain a variety of dimensions, such as query type keywords, factual, summarization, judgmental, query length, difficulty, and language, to ensure the high quality and diversity of the synthetic data. The synthetic data generated in this way not only meets the demand of large-scale pre-training in terms of quantity, but also can effectively improve the performance of the model in terms of quality.

(Text: AIGC Open Community)

© Copyright notes

Related posts

No comments

none
No comments...