Ali ends up with the strongest voice model Qwen3-ASR-Flash: Hear clearly, recognize accurately!

On September 8, Ali released its latestspeech recognitionmouldQwen3-ASR-FlashThe model is trained based on the Qwen3 base model and supports 11 languages and multiple accents. Users can experience it for free via ModelScope, HuggingFace and AliCloud Hundred Refined API Qwen3-ASR-Flash.

existASR ((Automatic speech recognition)Qwen3-ASR-Flash in multiple benchmarks of theRecognition error rates in dialect, multilingualism, key information recognition, and lyrics are significantly lower than those of Google Gemini-2.5-Pro, OpenAI GPT-4o-Transcribe, Alibaba Speech Lab Paraformer-v1, and Byte Doubao Doubao-ASR.

Specifically, the model supports Chinese, English, French, German, etc.11 languagesThe identification process canAutomatically recognizes voice languages and automatically filters out mute and background noise.and other non-speech segments, which is a speech recognition service built on massive multimodal data as well as ASR data on the scale of ten million hours.

In addition, users canCustomized ASR resultsBy adding contextual information such as key information terms, the context in which the audio is taking place, etc. when uploading the audio, the recognition results can be made to match this existing information.

The model supports Mandarin as well as dialects such as Szechuan, Minnan, Wu and Cantonese, English with British, American and multi-regional accents, and other languages such as French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean and Arabic.

For customized ASR results, users can provideArbitrarily formatted background text to obtain propensity ASR resultsand there is no need for the user to pre-process the contextual information.

Its supported formats include, but are not limited to, one of the following.Simple lists of keywords or buzzwords, complete paragraphs or entire documents of any length and source, keyword lists mixed with full paragraphs in any format, irrelevant or even meaningless text. The researchers mentioned that the model is highly robust to the negative effects of irrelevant context.

Based on this, Qwen3-ASR-Flash can use this context to identify and match named entities and other key terms to output customized identification results.

Ways of experiencing:

ModelScope:

https://modelscope.cn/studios/Qwen/Qwen3-ASR-Demo

HuggingFace.

https://huggingface.co/spaces/Qwen/Qwen3-ASR-Demo

AliCloud Hundred Refinement API:

https://bailian.console.aliyun.com/?tab=doc#/doc/?type=model&url=2979031

Demonstration

Qwen3-ASR-Flash Single Model Single Reasoning with no background information configured except for Example 2.

Continuous multi-type noise

Gaming Commentary

English Rap

Vehicle Noise Environment Dialect

Switching between multiple statements

Chemistry courses

future outlook

Qwen3-ASR-Flash will continue to be iteratively upgraded to continuously refine the universal recognition accuracy, and we will also develop more features to provide you with a smarter and better speech-to-text service.

artifact # Speech Recognition

The copyright of the article belongs to the author, please do not reprint without permission.

Google launches ultra-small AI model Gemma 3 270M! Cell phones can run it, a new breakthrough for smart devices running offline!

artifact # Gemma 3

No comments

No comments...

Ali ends up with the strongest voice model Qwen3-ASR-Flash: Hear clearly, recognize accurately!

Ways of experiencing:

Demonstration

future outlook

Ali Tongyi Thousand Questions released the largest model to date - Qwen3-Max-Preview, the number of parameters over 1 trillion

New MoE architecture! Ali open source Qwen3-Next, training costs straight down 90%!

Related posts

Enterprise WeChat 5.0 released: re-launch of the top ten AI features, new upgrade of the office experience

DeepSeek + Dify: A one-article guide to locally deploying an enterprise-grade knowledge base app

Free and open source! Google Launches AI Programming Kingpin Gemini CLI, Hardcore Claude Code

Google launches ultra-small AI model Gemma 3 270M! Cell phones can run it, a new breakthrough for smart devices running offline!

No comments

Popular Articles

Popular Sites