Ali ends up with the strongest voice model Qwen3-ASR-Flash: Hear clearly, recognize accurately!

artifact1wks agoupdate AiFun
99 0
阿里端出最强语音模型Qwen3-ASR-Flash:听得清,识得准!

On September 8, Ali released its latestspeech recognitionmouldQwen3-ASR-FlashThe model is trained based on the Qwen3 base model and supports 11 languages and multiple accents. Users can experience it for free via ModelScope, HuggingFace and AliCloud Hundred Refined API Qwen3-ASR-Flash.

existASR ((Automatic speech recognition)Qwen3-ASR-Flash in multiple benchmarks of theRecognition error rates in dialect, multilingualism, key information recognition, and lyrics are significantly lower than those of Google Gemini-2.5-Pro, OpenAI GPT-4o-Transcribe, Alibaba Speech Lab Paraformer-v1, and Byte Doubao Doubao-ASR.

Specifically, the model supports Chinese, English, French, German, etc.11 languagesThe identification process canAutomatically recognizes voice languages and automatically filters out mute and background noise.and other non-speech segments, which is a speech recognition service built on massive multimodal data as well as ASR data on the scale of ten million hours.

In addition, users canCustomized ASR resultsBy adding contextual information such as key information terms, the context in which the audio is taking place, etc. when uploading the audio, the recognition results can be made to match this existing information.
阿里端出最强语音模型Qwen3-ASR-Flash:听得清,识得准!
The model supports Mandarin as well as dialects such as Szechuan, Minnan, Wu and Cantonese, English with British, American and multi-regional accents, and other languages such as French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean and Arabic.

For customized ASR results, users can provideArbitrarily formatted background text to obtain propensity ASR resultsand there is no need for the user to pre-process the contextual information.

Its supported formats include, but are not limited to, one of the following.Simple lists of keywords or buzzwords, complete paragraphs or entire documents of any length and source, keyword lists mixed with full paragraphs in any format, irrelevant or even meaningless text. The researchers mentioned that the model is highly robust to the negative effects of irrelevant context.

Based on this, Qwen3-ASR-Flash can use this context to identify and match named entities and other key terms to output customized identification results.

Ways of experiencing:

ModelScope:

https://modelscope.cn/studios/Qwen/Qwen3-ASR-Demo

HuggingFace. 

https://huggingface.co/spaces/Qwen/Qwen3-ASR-Demo

AliCloud Hundred Refinement API:

https://bailian.console.aliyun.com/?tab=doc#/doc/?type=model&url=2979031

Demonstration

Qwen3-ASR-Flash Single Model Single Reasoning with no background information configured except for Example 2.

Continuous multi-type noise
阿里端出最强语音模型Qwen3-ASR-Flash:听得清,识得准!
Gaming Commentary
阿里端出最强语音模型Qwen3-ASR-Flash:听得清,识得准!
English Rap
阿里端出最强语音模型Qwen3-ASR-Flash:听得清,识得准!
Vehicle Noise Environment Dialect
阿里端出最强语音模型Qwen3-ASR-Flash:听得清,识得准!
Switching between multiple statements
阿里端出最强语音模型Qwen3-ASR-Flash:听得清,识得准!
Chemistry courses
阿里端出最强语音模型Qwen3-ASR-Flash:听得清,识得准!

future outlook

Qwen3-ASR-Flash will continue to be iteratively upgraded to continuously refine the universal recognition accuracy, and we will also develop more features to provide you with a smarter and better speech-to-text service.

© Copyright notes

Related posts

No comments

none
No comments...