SongBloomTranslation site

2dys agoupdate 42 0 0

Tencent AI Lab and other joint research and development of open source song generation model, 10 seconds of audio + lyrics into 2 minutes 30 seconds of high-quality music, comparable to commercial standards.

Language:
en
Collection time:
2025-10-14
SongBloomSongBloom

What is SongBloom?

SongBloom is an open-source program developed by Tencent AI Lab in collaboration with top universities.Song GenerationThe NeurIPS 2025 model, selected by the global AI summit for its innovative technology, breaks through the traditional limitations by requiring users to provide only 10-second audio clips (e.g., vocals, musical instruments) and corresponding lyrics text, and automatically generates high-quality music in 2 minutes and 30 seconds at 48 kHz dual-channel, with sound quality close to studio level, and with a melodic coherence and lyrics matching that is significantly better than most open-source solutions, and even comparable to commercial closed-source systems such as Suno-v4.5 and Suno-v4.5. The sound quality is close to studio level, and the melodic coherence and lyrics matching is significantly better than most open source solutions, even comparable to the commercial closed source system Suno-v4.5.

Its core advantage lies in the autoregressive diffusion mechanism and alternate generation paradigm, taking into account both structural logic and detailed performance, while lowering the technical threshold through open source code and pre-training weights, and supporting the secondary development of academic research and commercial applications. Whether it's rapid creation by independent musicians, customization of film and television soundtracks, or music education assistance, SongBloom can provide efficient and professional solutions to promote AI music technology into a new stage of universalization.

SongBloom's main features

  1. High quality song generation
    • Input Requirements: 10-second audio clip (e.g., vocals, instruments) + lyrics text.
    • Output Specification: 2 minutes and 30 seconds of two-channel audio, sampled at 48kHz, for near-studio-quality sound.
    • Core Advantages::
      • sound quality reproduction: Optimization of continuous acoustic features by diffusion modeling, with vocal subtlety exceeding that of the closed-source model Suno-v4.5.
      • Lyrics alignment accuracy: Phonetic Error Rate (PER) has been reduced to a new low, significantly mitigating the "hallucination problem" (generation of content that deviates from the intent of the lyrics).
  2. Technology Architecture Innovation
    • autoregressive diffusion mechanism: Fusing the structural coherence advantages of autoregressive modeling with the sound quality enhancement capabilities of diffusion modeling.
    • Interleaved Generation Paradigm (Interleaved Generation)The dynamic switching between "Semantic Understanding" and "Acoustic Generation" modes guarantees the logical integrity of the song and localized sound quality refinement.
  3. Open Source and Scalability
    • The project code and pre-trained weights have been fully open-sourced to support secondary development.
    • Future Planned Releases 240 Seconds Full Version Model up to New version with enhanced text control.

SongBloom's project address

Scenarios for SongBloom

  1. Music Composition and Production
    • independent musician: Quickly generate first drafts of songs to lower the creative threshold.
    • Movie/Game Soundtracks: Generate customized background music according to the needs of the scene.
  2. Education and Research
    • music teaching: To assist students in understanding the matching of melody, harmony and lyrics.
    • AI Music Research: Provide open source benchmarking models to drive technology iteration.
  3. business application
    • Commercials/short video soundtracks: Efficiently generate music that matches the brand's tone.
    • Virtual Idol Performance: Generate songs synchronized with the action in real time.

How do I use SongBloom?

  1. environmental preparation
    • Download the open source code and pre-training weights (GitHub address:https://github.com/tencent-ailab/SongBloom).
    • Install dependent libraries (e.g. PyTorch, CUDA).
  2. Input Requirements
    • Audio samples: 10 seconds or more of clear, uncluttered footage (supporting vocals or instruments).
    • Lyrics text: It needs to match the audio style (e.g. upbeat lyrics are good for fast songs).
  3. Generation process
    • Run the modeling script and enter the audio path with the lyrics text.
    • The model automatically completes the whole process of "semantic analysis → melody generation → sound quality optimization".
    • The output file is in WAV format and can be used directly in post-production.
  4. parameter tuning
    • align Number of sketch tokens Controls for generating detail richness.
    • modifications VAE latent Spatial dimension Optimizing the balance between sound quality and computational efficiency.

Recommended Reasons

  1. technological leadership
    • In subjective and objective reviews, the aesthetics score outperforms most open-source models and is comparable to top commercial closed-source systems (e.g., Suno-v4.5).
    • Melodicity and musical expression are close to the domain optimum (SOTA).
  2. open source ecological value
    • The code is fully public, lowering the technical barriers to AI music and fostering innovation in the community.
    • Support secondary development for academic research and commercial applications.
  3. User Experience Advantage
    • Input simplicity: Just 10 seconds of audio + lyrics, no complex parameterization required.
    • Export Specialties: Generate music that is close to professional production levels, saving time in post-tuning.
  4. future potential
    • The plan is to release a version with longer duration (240 seconds) and stronger text control to cover more application scenarios.

data statistics

Relevant Navigation

No comments

none
No comments...