PrismAudio

7dys agoupdate 178 0 0

Ali launched the video to generate audio framework, through the “chain of thought + reinforcement learning” technology to achieve a high degree of synchronization of audio and video, can efficiently generate environmental sound effects, applicable to film and television, games, short videos and other multi-scene creation.

Language:

Collection time:

2026-03-24

Open site Mobile view

PrismAudio

Open site

What is PrismAudio?

PrismAudio is a Video-to-Audio framework released by Alibaba Tongyi Labs on March 24, 2026, focusing on the synthesis of ambient sound and sound effects. As the first model that combines reinforcement learning and thought chain technology in depth, PrismAudio realizes high synchronization between sound and video content through the generation paradigm of “think before you speak”, which solves the problems of inconsistency and inefficiency of traditional models. Its research results have been accepted by ICLR 2026, and the code will be open-sourced soon.

PrismAudio's main features

Ambient sound/sound synthesis
- Automatically generate background sound effects to match the screen, such as hoof beats, wind and rain sounds, metal banging sounds, etc., replacing the traditional onomatopoeic work.
- Supports complex sound generation for multi-event, multi-source scenarios, maintaining stable output.
Four-dimensional synergistic optimization
- semantic alignment: Ensure that the sound content accurately corresponds to the objects and movements in the video (e.g., recognize “hoofbeats” instead of “bird calls”).
- chronological synchronization: Precisely control the timing of sound and visual events to achieve millisecond synchronization.
- Aesthetic Optimization: Generate natural, layered, electronic-free, high-quality audio to enhance the listening experience.
- spatial orientationSupport stereo output, according to the position of the sound source in the screen automatically adjusts the left and right channels, to realize the “hear the sound of the position”.
Highly efficient and lightweight
- The number of parameters in the model is only 518 million, and it takes only 0.63 seconds to generate 9 seconds of audio, which is nearly twice as fast as similar models and suitable for real-time application scenarios.
chain-of-minds reasoning
- Using “Decompositional Chain of Thought” technology, the model is first generated into structured reasoning text (e.g., sound content, timing, texture, and orientation), and then the audio is generated, making the process interpretable and controllable.

Scenarios for using PrismAudio

post-production for film and television
- Automatically generate ambient sound for movies, documentaries, and trailers to reduce post-production costs and time.
Short video creation
- Quickly match ambient sounds to silent videos such as Vlogs, food, travel, etc. to enhance immersion and communication.
game development
- Generate dynamic sound effects for transitions and CG promos, matching real-time ambient sounds to forests, cities, battlefields, and other scenes, reducing repetitive labor for sound engineers.
advertising marketing
- Automatically add operational sound effects to product demonstration videos and support rapid iteration of multiple versions of audio tracks to improve ad testing efficiency and creative flexibility.
Education and training
- Supplement the teaching video and operation demonstration with prompts and background sounds to enrich the auditory experience of multimedia courseware and improve learning concentration.

How do I use PrismAudio?

Input Requirements
- The input video needs to contain clear visual events (e.g., actions, object movement) for the model to recognize and generate corresponding sound effects.
parameterization
- Users can adjust parameters such as sound style (e.g. natural, sci-fi, horror), sound intensity, stereo effect, etc. according to their needs.
output format
- Supports generation of common audio formats (e.g. WAV, MP3) for direct use in video editing software or game engines.
Efficient training algorithm (Fast-GRPO)
- The model is optimized for training efficiency by the Fast-GRPO algorithm, which reduces the cost of random sampling and quickly adapts to the needs of different scenarios.

PrismAudio's project address

Project website::https://prismaudio-project.github.io/
GitHub repository::https://github.com/FunAudioLLM/ThinkSound/tree/prismaudio
HuggingFace Model Library::https://huggingface.co/FunAudioLLM/PrismAudio
arXiv Technical Paper::https://arxiv.org/pdf/2511.18833
Online Experience Demo::https://huggingface.co/spaces/FunAudioLLM/PrismAudio

Recommended Reasons

technological breakthrough
- The first “thought chain + reinforcement learning” framework, to solve the traditional model of audio and video inconsistency, inefficiency, representing the latest research progress in the field of video generation audio.
superior performance
- It outperforms the best available models on authoritative test sets such as VGGSound and AudioCanvas, especially in complex scenarios.
Lightweight and real-time
- With only 518 million parameters, it is fast to generate and suitable for real-time application scenarios (e.g., live streaming, gaming).
Multi-scenario applicability
- Covering a wide range of fields such as film and television, games, advertising, education, etc., it lowers the technical threshold of audio and video content creation.
Open Source and Community Support
- The code will soon be open-sourced, and developers will be able to carry out secondary development based on the model, promoting the universalization of the technology.

data statistics

Relevant Navigation

No comments

No comments...

PrismAudio

What is PrismAudio?

PrismAudio's main features

Scenarios for using PrismAudio

How do I use PrismAudio?

PrismAudio's project address

Recommended Reasons

data statistics

Relevant Navigation

MAI-Voice-1

Qwen3-ASR-Flash

Noiz AI

Pascal Editor

WeChat ClawBot

UntitledPen

DuClaw

MiniMax Audio

No comments

Latest Articles

Popular Sites

PrismAudio

What is PrismAudio?

PrismAudio's main features

Scenarios for using PrismAudio

How do I use PrismAudio?

PrismAudio's project address

Recommended Reasons

data statistics

Relevant Navigation

MAI-Voice-1

Qwen3-ASR-Flash

Noiz AI

Pascal Editor

WeChat ClawBot

UntitledPen

DuClaw

MiniMax Audio

No comments

Latest Articles

Popular Sites

Tag Cloud