
DoubaoLarge ModelIt is a family of models with multimodal capabilities introduced by ByteDance, covering several models with different technical features and highlights. Originally named "Lark", ByteDance's Big Model was officially released on May 15, 2024 at the Volcano Engine Power Conference. The model is one of the first big models to pass the Interim Measures for the Administration of Generative Artificial Intelligence Services, meaning that its technology and application meet the requirements of the relevant regulations.
1. Model family members
The Beanbag Big Model family mainly consists of the following members:
- Beanbag General Model Pro: for complex applications requiring deep text understanding and generation.
- Beanbag Generic Model Lite: more cost-efficient and suitable for scenarios with stringent requirements on speed and running costs.
- Beanbag-Roleplay Modeling: the ability to simulate different roles in a conversation.
- Beanbag-Speech Synthesis Model: provides natural speech synthesis technology.
- Beanbag-voice replica modeling: highly reproducible voice replication technology.
- Beanbag-Speech Recognition Model: for converting speech to text.
- Beanbag-text-generated graph model: the ability to generate images that match the textual content.
- Beanbag-Function Call model: specific functions and application scenarios may involve more specialized technical calls.
2. Technical characteristics
- multimodal capability: The beanbag big model family is not limited to processing text, but also covers multiple modalities such as language, vision and sound, enabling cross-modal information understanding and interaction.
- Customization & Personalization: The model design takes into account the needs of different industries and business scenarios, and supports a high degree of customization and personalization.
- High performance and low latency: Demonstrates low latency and high throughput when processing large-scale data, ensuring performance in real-world applications.
- Safety and reliability: Multi-dimensional security measures are taken to ensure the safe and stable operation of the model.
3. Application scenarios
The Beanbag Big Model family has been applied in multiple business scenarios both internally and externally, significantly improving efficiency and product experience. These scenarios include, but are not limited to, more than 50 businesses such as Jitterbug, Tomato Novels, Flying Book, and Mega Engine.
4. Data-processing capacity
The Beanbag Big Model processes 120 billion Tokens of text and generates 30 million images on a daily basis, and is becoming one of the most heavily used big models with the richest application scenarios in China.
With its multi-modal capabilities, customization and personalization, high performance and low latency, secure and reliable technical features, as well as a wide range of application scenarios and competitive pricing strategies, Beanbag Big Model is becoming one of the most talked about Big Models in the industry.
data statistics
Related Navigation

The cross-modal general artificial intelligence platform developed by the Institute of Automation of the Chinese Academy of Sciences has the world's first graphic, text and audio three-modal pre-training model with cross-modal comprehension and generation capabilities, supporting full-scene AI applications, which is a major breakthrough towards general artificial intelligence.

CosyVoice
Alibaba's open-source large-scale speech model supports zero-shot cloning in 3 seconds, multilingual capabilities, and command-based emotional control, enabling ultra-low-latency streaming synthesis at 150 ms.

DeepSeek-V3
Hangzhou Depth Seeker has launched an efficient open source language model with 67.1 billion parameters, using a hybrid expert architecture that excels at handling math, coding and multilingual tasks.

Qwen3-Next
Ali open source 80 billion parameters of the big model, 1:50 super sparse activation, millions of contexts, the cost down 90%, the performance is comparable to the hundreds of billions of models.

DeepSeek-V4
The new generation of domestic open-source flagship big model has become one of the strongest all-around AIs on the ground with millions of ultra-long contexts, performance comparable to the top international closed-source models, and extreme cost-effectiveness.

R1-Omni
Alibaba's open-source multimodal large language model uses RLVR technology to achieve emotion recognition and provide an interpretable reasoning process for multiple scenarios.

HunyuanImage2.1
Tencent launched the open source raw image model, which natively supports 2K HD raw images, accurately parses complex semantics, and can efficiently generate high-quality images with Chinese and English fusion.

Claude 3.7 Sonnet
Anthropic has released the world's first hybrid reasoning model that demonstrates superior performance and flexibility by being able to flexibly switch between rapid response and deeper reflection based on different needs.
No comments...
