
What is SongBloom?
SongBloom is an open-source program developed by Tencent AI Lab in collaboration with top universities.Song GenerationThe NeurIPS 2025 model, selected by the global AI summit for its innovative technology, breaks through the traditional limitations by requiring users to provide only 10-second audio clips (e.g., vocals, musical instruments) and corresponding lyrics text, and automatically generates high-quality music in 2 minutes and 30 seconds at 48 kHz dual-channel, with sound quality close to studio level, and with a melodic coherence and lyrics matching that is significantly better than most open-source solutions, and even comparable to commercial closed-source systems such as Suno-v4.5 and Suno-v4.5. The sound quality is close to studio level, and the melodic coherence and lyrics matching is significantly better than most open source solutions, even comparable to the commercial closed source system Suno-v4.5.
Its core advantage lies in the autoregressive diffusion mechanism and alternate generation paradigm, taking into account both structural logic and detailed performance, while lowering the technical threshold through open source code and pre-training weights, and supporting the secondary development of academic research and commercial applications. Whether it's rapid creation by independent musicians, customization of film and television soundtracks, or music education assistance, SongBloom can provide efficient and professional solutions to promote AI music technology into a new stage of universalization.
SongBloom's main features
- High quality song generation
- Input Requirements: 10-second audio clip (e.g., vocals, instruments) + lyrics text.
- Output Specification: 2 minutes and 30 seconds of two-channel audio, sampled at 48kHz, for near-studio-quality sound.
- Core Advantages::
- sound quality reproduction: Optimization of continuous acoustic features by diffusion modeling, with vocal subtlety exceeding that of the closed-source model Suno-v4.5.
- Lyrics alignment accuracy: Phonetic Error Rate (PER) has been reduced to a new low, significantly mitigating the "hallucination problem" (generation of content that deviates from the intent of the lyrics).
- Technology Architecture Innovation
- autoregressive diffusion mechanism: Fusing the structural coherence advantages of autoregressive modeling with the sound quality enhancement capabilities of diffusion modeling.
- Interleaved Generation Paradigm (Interleaved Generation)The dynamic switching between "Semantic Understanding" and "Acoustic Generation" modes guarantees the logical integrity of the song and localized sound quality refinement.
- Open Source and Scalability
- The project code and pre-trained weights have been fully open-sourced to support secondary development.
- Future Planned Releases 240 Seconds Full Version Model up to New version with enhanced text control.
SongBloom's project address
- Github repository::https://github.com/tencent-ailab/SongBloom
- HuggingFace Model Library::https://huggingface.co/CypressYang/SongBloom
- arXiv Technical Paper::https://arxiv.org/pdf/2506.07634
- Online Experience Demo::https://cypress-yang.github.io/SongBloom_demo/
Scenarios for SongBloom
- Music Composition and Production
- independent musician: Quickly generate first drafts of songs to lower the creative threshold.
- Movie/Game Soundtracks: Generate customized background music according to the needs of the scene.
- Education and Research
- music teaching: To assist students in understanding the matching of melody, harmony and lyrics.
- AI Music Research: Provide open source benchmarking models to drive technology iteration.
- business application
- Commercials/short video soundtracks: Efficiently generate music that matches the brand's tone.
- Virtual Idol Performance: Generate songs synchronized with the action in real time.
How do I use SongBloom?
- environmental preparation
- Download the open source code and pre-training weights (GitHub address:
https://github.com/tencent-ailab/SongBloom). - Install dependent libraries (e.g. PyTorch, CUDA).
- Download the open source code and pre-training weights (GitHub address:
- Input Requirements
- Audio samples: 10 seconds or more of clear, uncluttered footage (supporting vocals or instruments).
- Lyrics text: It needs to match the audio style (e.g. upbeat lyrics are good for fast songs).
- Generation process
- Run the modeling script and enter the audio path with the lyrics text.
- The model automatically completes the whole process of "semantic analysis → melody generation → sound quality optimization".
- The output file is in WAV format and can be used directly in post-production.
- parameter tuning
- align Number of sketch tokens Controls for generating detail richness.
- modifications VAE latent Spatial dimension Optimizing the balance between sound quality and computational efficiency.
Recommended Reasons
- technological leadership
- In subjective and objective reviews, the aesthetics score outperforms most open-source models and is comparable to top commercial closed-source systems (e.g., Suno-v4.5).
- Melodicity and musical expression are close to the domain optimum (SOTA).
- open source ecological value
- The code is fully public, lowering the technical barriers to AI music and fostering innovation in the community.
- Support secondary development for academic research and commercial applications.
- User Experience Advantage
- Input simplicity: Just 10 seconds of audio + lyrics, no complex parameterization required.
- Export Specialties: Generate music that is close to professional production levels, saving time in post-tuning.
- future potential
- The plan is to release a version with longer duration (240 seconds) and stronger text control to cover more application scenarios.
data statistics
Relevant Navigation

An open source framework for building large-scale language modeling application designs, providing modular components and toolchains to support the entire application lifecycle from development to production.

Mistral 7B
A powerful large-scale language model with about 7.3 billion parameters, developed by Mistral.AI, demonstrates excellent multilingual processing power and reasoning performance.

Zen Browser
An open-source desktop browser based on the Firefox engine, featuring vertical tabs, workspaces, and split-screen views, emphasizing privacy protection and a modern browsing experience focused on efficiency and concentration.

kotaemon RAG
Open source chat application tool that allows users to query and access relevant information in documents by chatting.

DeepSeek-V3
Hangzhou Depth Seeker has launched an efficient open source language model with 67.1 billion parameters, using a hybrid expert architecture that excels at handling math, coding and multilingual tasks.

SkyReels-V2
The unlimited duration movie generation model introduced by KunlunWanwei team breaks through the bottleneck of the existing video generation technology and realizes high-quality, high-consistency and high-fidelity video creation.

NetEase Cloud Music-X Studio
AI music creation tool, providing rich virtual singer resources and powerful editing functions, making music creation easy and full of creativity.

Qwen3-Coder
Ali open source code big model, support full-flow programming and complex task planning, performance over GPT-4.1, lower cost.
No comments...
