Ali ends up with the strongest voice model Qwen3-ASR-Flash: Hear clearly, recognize accurately!

On September 8, Ali released its latestspeech recognitionmouldQwen3-ASR-FlashThe model is trained based on the Qwen3 base model and supports 11 languages and multiple accents. Users can experience it for free via ModelScope, HuggingFace and AliCloud Hundred Refined API Qwen3-ASR-Flash.
existASR ((Automatic speech recognition)Qwen3-ASR-Flash in multiple benchmarks of theRecognition error rates in dialect, multilingualism, key information recognition, and lyrics are significantly lower than those of Google Gemini-2.5-Pro, OpenAI GPT-4o-Transcribe, Alibaba Speech Lab Paraformer-v1, and Byte Doubao Doubao-ASR.
Specifically, the model supports Chinese, English, French, German, etc.11 languagesThe identification process canAutomatically recognizes voice languages and automatically filters out mute and background noise.and other non-speech segments, which is a speech recognition service built on massive multimodal data as well as ASR data on the scale of ten million hours.

For customized ASR results, users can provideArbitrarily formatted background text to obtain propensity ASR resultsand there is no need for the user to pre-process the contextual information.
Its supported formats include, but are not limited to, one of the following.Simple lists of keywords or buzzwords, complete paragraphs or entire documents of any length and source, keyword lists mixed with full paragraphs in any format, irrelevant or even meaningless text. The researchers mentioned that the model is highly robust to the negative effects of irrelevant context.
Based on this, Qwen3-ASR-Flash can use this context to identify and match named entities and other key terms to output customized identification results.
Ways of experiencing:
HuggingFace.
AliCloud Hundred Refinement API:
Demonstration
Qwen3-ASR-Flash Single Model Single Reasoning with no background information configured except for Example 2.






future outlook
Qwen3-ASR-Flash will continue to be iteratively upgraded to continuously refine the universal recognition accuracy, and we will also develop more features to provide you with a smarter and better speech-to-text service.
© Copyright notes
The copyright of the article belongs to the author, please do not reprint without permission.
Related posts
No comments...