
What is SongBloom?
SongBloom is developed by Tencent AI Lab in collaboration with top universities.Open SourceSong GenerationThe NeurIPS 2025 model, selected by the global AI summit for its innovative technology, breaks through the traditional limitations by requiring users to provide only 10-second audio clips (e.g., vocals, musical instruments) and corresponding lyrics text, and automatically generates high-quality music in 2 minutes and 30 seconds at 48 kHz dual-channel, with sound quality close to studio level, and with a melodic coherence and lyrics matching that is significantly better than most open-source solutions, and even comparable to commercial closed-source systems such as Suno-v4.5 and Suno-v4.5. The sound quality is close to studio level, and the melodic coherence and lyrics matching is significantly better than most open source solutions, even comparable to the commercial closed source system Suno-v4.5.
Its core advantage lies in the autoregressive diffusion mechanism and alternate generation paradigm, taking into account both structural logic and detailed performance, while lowering the technical threshold through open source code and pre-training weights, and supporting the secondary development of academic research and commercial applications. Whether it's rapid creation by independent musicians, customization of film and television soundtracks, or music education assistance, SongBloom can provide efficient and professional solutions to promote AI music technology into a new stage of universalization.
SongBloom's main features
- High quality song generation
- Input Requirements: 10-second audio clip (e.g., vocals, instruments) + lyrics text.
- Output Specification: 2 minutes and 30 seconds of two-channel audio, sampled at 48kHz, for near-studio-quality sound.
- Core Advantages::
- sound quality reproduction: Optimization of continuous acoustic features by diffusion modeling, with vocal subtlety exceeding that of the closed-source model Suno-v4.5.
- Lyrics alignment accuracy: Phonetic Error Rate (PER) has been reduced to a new low, significantly mitigating the "hallucination problem" (generation of content that deviates from the intent of the lyrics).
- Technology Architecture Innovation
- autoregressive diffusion mechanism: Fusing the structural coherence advantages of autoregressive modeling with the sound quality enhancement capabilities of diffusion modeling.
- Interleaved Generation Paradigm (Interleaved Generation)The dynamic switching between "Semantic Understanding" and "Acoustic Generation" modes guarantees the logical integrity of the song and localized sound quality refinement.
- Open Source and Scalability
- The project code and pre-trained weights have been fully open-sourced to support secondary development.
- Future Planned Releases 240 Seconds Full Version Model up to New version with enhanced text control.
SongBloom's project address
- Github repository::https://github.com/tencent-ailab/SongBloom
- HuggingFace Model Library::https://huggingface.co/CypressYang/SongBloom
- arXiv Technical Paper::https://arxiv.org/pdf/2506.07634
- Online Experience Demo::https://cypress-yang.github.io/SongBloom_demo/
Scenarios for SongBloom
- Music Composition and Production
- independent musician: Quickly generate first drafts of songs to lower the creative threshold.
- Movie/Game Soundtracks: Generate customized background music according to the needs of the scene.
- Education and Research
- music teaching: To assist students in understanding the matching of melody, harmony and lyrics.
- AI Music Research: Provide open source benchmarking models to drive technology iteration.
- business application
- Commercials/short video soundtracks: Efficiently generate music that matches the brand's tone.
- Virtual Idol Performance: Generate songs synchronized with the action in real time.
How do I use SongBloom?
- environmental preparation
- Download the open source code and pre-training weights (GitHub address:
https://github.com/tencent-ailab/SongBloom). - Install dependent libraries (e.g. PyTorch, CUDA).
- Download the open source code and pre-training weights (GitHub address:
- Input Requirements
- Audio samples: 10 seconds or more of clear, uncluttered footage (supporting vocals or instruments).
- Lyrics text: It needs to match the audio style (e.g. upbeat lyrics are good for fast songs).
- Generation process
- Run the modeling script and enter the audio path with the lyrics text.
- The model automatically completes the whole process of "semantic analysis → melody generation → sound quality optimization".
- The output file is in WAV format and can be used directly in post-production.
- parameter tuning
- align Number of sketch tokens Controls for generating detail richness.
- modifications VAE latent Spatial dimension Optimizing the balance between sound quality and computational efficiency.
Recommended Reasons
- technological leadership
- In subjective and objective reviews, the aesthetics score outperforms most open-source models and is comparable to top commercial closed-source systems (e.g., Suno-v4.5).
- Melodicity and musical expression are close to the domain optimum (SOTA).
- open source ecological value
- The code is fully public, lowering the technical barriers to AI music and fostering innovation in the community.
- Support secondary development for academic research and commercial applications.
- User Experience Advantage
- Input simplicity: Just 10 seconds of audio + lyrics, no complex parameterization required.
- Export Specialties: Generate music that is close to professional production levels, saving time in post-tuning.
- future potential
- The plan is to release a version with longer duration (240 seconds) and stronger text control to cover more application scenarios.
data statistics
Relevant Navigation

An open-source desktop browser based on the Firefox engine, featuring vertical tabs, workspaces, and split-screen views, emphasizing privacy protection and a modern browsing experience focused on efficiency and concentration.

Shortest
An end-to-end testing framework based on natural language processing and AI technologies which streamlines the testing process, increases testing efficiency, and lowers the testing threshold.

Waver 1.0
Waver 1.0 is an open source full-featured video generation model that makes it easy to create text/images to HD video with efficiency, convenience and outstanding quality.

InternLM
Shanghai AI Lab leads the launch of a comprehensive big model research and development platform, providing an efficient tool chain and rich application scenarios to support multimodal data processing and analysis.

Krillin AI
AI video subtitle translation and dubbing tool, supporting multi-language input and translation, providing one-stop solution from video acquisition to subtitle translation and dubbing.

TeleChat
The 7 billion parameter semantic grand model based on the Transformer architecture launched by China Telecom has powerful natural language understanding and generation capabilities, and is applicable to multiple AI application scenarios such as intelligent dialog and text generation.

Seed-OSS
ByteDance's open-source 36 billion parameter-long contextual big language model supports 512K tokens, a controlled mind budget, excels in inference, code and agent tasks, and is freely commercially available under the Apache-2.0 license.

OpenClacky
An extreme Token-saving, open-source, general-purpose AI Agent with Skill skill ecosystem support that automates programming, office and all kinds of complex tasks for you locally at a very low cost.
No comments...
