
What is SongBloom?
SongBloom is an open-source program developed by Tencent AI Lab in collaboration with top universities.Song GenerationThe NeurIPS 2025 model, selected by the global AI summit for its innovative technology, breaks through the traditional limitations by requiring users to provide only 10-second audio clips (e.g., vocals, musical instruments) and corresponding lyrics text, and automatically generates high-quality music in 2 minutes and 30 seconds at 48 kHz dual-channel, with sound quality close to studio level, and with a melodic coherence and lyrics matching that is significantly better than most open-source solutions, and even comparable to commercial closed-source systems such as Suno-v4.5 and Suno-v4.5. The sound quality is close to studio level, and the melodic coherence and lyrics matching is significantly better than most open source solutions, even comparable to the commercial closed source system Suno-v4.5.
Its core advantage lies in the autoregressive diffusion mechanism and alternate generation paradigm, taking into account both structural logic and detailed performance, while lowering the technical threshold through open source code and pre-training weights, and supporting the secondary development of academic research and commercial applications. Whether it's rapid creation by independent musicians, customization of film and television soundtracks, or music education assistance, SongBloom can provide efficient and professional solutions to promote AI music technology into a new stage of universalization.
SongBloom's main features
- High quality song generation
- Input Requirements: 10-second audio clip (e.g., vocals, instruments) + lyrics text.
- Output Specification: 2 minutes and 30 seconds of two-channel audio, sampled at 48kHz, for near-studio-quality sound.
- Core Advantages::
- sound quality reproduction: Optimization of continuous acoustic features by diffusion modeling, with vocal subtlety exceeding that of the closed-source model Suno-v4.5.
- Lyrics alignment accuracy: Phonetic Error Rate (PER) has been reduced to a new low, significantly mitigating the "hallucination problem" (generation of content that deviates from the intent of the lyrics).
- Technology Architecture Innovation
- autoregressive diffusion mechanism: Fusing the structural coherence advantages of autoregressive modeling with the sound quality enhancement capabilities of diffusion modeling.
- Interleaved Generation Paradigm (Interleaved Generation)The dynamic switching between "Semantic Understanding" and "Acoustic Generation" modes guarantees the logical integrity of the song and localized sound quality refinement.
- Open Source and Scalability
- The project code and pre-trained weights have been fully open-sourced to support secondary development.
- Future Planned Releases 240 Seconds Full Version Model up to New version with enhanced text control.
SongBloom's project address
- Github repository::https://github.com/tencent-ailab/SongBloom
- HuggingFace Model Library::https://huggingface.co/CypressYang/SongBloom
- arXiv Technical Paper::https://arxiv.org/pdf/2506.07634
- Online Experience Demo::https://cypress-yang.github.io/SongBloom_demo/
Scenarios for SongBloom
- Music Composition and Production
- independent musician: Quickly generate first drafts of songs to lower the creative threshold.
- Movie/Game Soundtracks: Generate customized background music according to the needs of the scene.
- Education and Research
- music teaching: To assist students in understanding the matching of melody, harmony and lyrics.
- AI Music Research: Provide open source benchmarking models to drive technology iteration.
- business application
- Commercials/short video soundtracks: Efficiently generate music that matches the brand's tone.
- Virtual Idol Performance: Generate songs synchronized with the action in real time.
How do I use SongBloom?
- environmental preparation
- Download the open source code and pre-training weights (GitHub address:
https://github.com/tencent-ailab/SongBloom). - Install dependent libraries (e.g. PyTorch, CUDA).
- Download the open source code and pre-training weights (GitHub address:
- Input Requirements
- Audio samples: 10 seconds or more of clear, uncluttered footage (supporting vocals or instruments).
- Lyrics text: It needs to match the audio style (e.g. upbeat lyrics are good for fast songs).
- Generation process
- Run the modeling script and enter the audio path with the lyrics text.
- The model automatically completes the whole process of "semantic analysis → melody generation → sound quality optimization".
- The output file is in WAV format and can be used directly in post-production.
- parameter tuning
- align Number of sketch tokens Controls for generating detail richness.
- modifications VAE latent Spatial dimension Optimizing the balance between sound quality and computational efficiency.
Recommended Reasons
- technological leadership
- In subjective and objective reviews, the aesthetics score outperforms most open-source models and is comparable to top commercial closed-source systems (e.g., Suno-v4.5).
- Melodicity and musical expression are close to the domain optimum (SOTA).
- open source ecological value
- The code is fully public, lowering the technical barriers to AI music and fostering innovation in the community.
- Support secondary development for academic research and commercial applications.
- User Experience Advantage
- Input simplicity: Just 10 seconds of audio + lyrics, no complex parameterization required.
- Export Specialties: Generate music that is close to professional production levels, saving time in post-tuning.
- future potential
- The plan is to release a version with longer duration (240 seconds) and stronger text control to cover more application scenarios.
data statistics
Relevant Navigation

Alibaba launched an efficient video generation model that can accurately simulate complex scenes and actions, support Chinese and English special effects, and lead a new era of AI video creation.

TeleChat
The 7 billion parameter semantic grand model based on the Transformer architecture launched by China Telecom has powerful natural language understanding and generation capabilities, and is applicable to multiple AI application scenarios such as intelligent dialog and text generation.

ChatTTS
An open source text-to-speech model optimized for conversational scenarios, capable of generating high-quality, natural and smooth conversational speech.

Krillin AI
AI video subtitle translation and dubbing tool, supporting multi-language input and translation, providing one-stop solution from video acquisition to subtitle translation and dubbing.

GPT-SoVITS
Open source sound cloning tool focused on enabling high quality, cross-language sound (especially singing) conversion.

OpenHands
Open source software development agent platform designed to improve developer efficiency and productivity through features such as intelligent task execution and code optimization.

PaddleOCR-VL
Baidu's lightweight multimodal document parsing model, with 0.9B parameters, achieves accurate recognition and structured output of complex documents in 109 languages, with world-leading performance.

Beatoven.ai
AI music creation platform designed to provide high-quality, royalty-free personalized soundtracks for digital content creators.
No comments...
