Wall-OSS

3wks agorelease 211 0 0

A 4.2 billion-parameter open source body intelligence model developed by Variable Robotics realizes “out-of-the-box” zero-sample deployment capability by virtue of its innovative end-to-end architecture, allowing developers to empower robots with powerful cognitive, reasoning, and fine manipulation capabilities with only a consumer-grade graphics card.

Language:
zh,en
Collection time:
2026-05-28
Wall-OSSWall-OSS

What is Wall-OSS?

Wall-OSS is a nationally owned and operated company.embodied intelligenceThe open source Vision-Language-Action (VLA) macromodel was developed by the company X Square Robot. In May 2026, the team officially open-sourced the latest version Wall-OSS-0.5It is the first breakthrough in the industry that enables multiple operations to be performed directly on real robots without the need to fine-tune them for specific tasks.

Wall-OSS is an end-to-end generalized embodied intelligence base model designed to give robots a human-like “brain” that enables them to understand the physical world, reason logically, and perform fine-grained actions. It breaks the limitations of traditional robot models, which require individual scripting or large-scale fine-tuning for each task, and has strong zero-shot generalization capabilities.

It is worth mentioning that the model has a “lightweight” design, with only a few parameters. 4.2 billion (4.2B)The design of the RTX 4090 is a great example of how this can be accomplished. Instead of sacrificing performance, this design dramatically lowers the barriers to development, allowing the average developer to go from training to deployment using only a consumer-grade graphics card such as the RTX 4090.

Main features of Wall-OSS

  • Zero sample direct deployment: The pre-trained model performs complex tasks such as handling, sorting, and organizing directly on real robots without any task-specific fine-tuning. In 17 zero-sample test tasks, multiple tasks scored over 80 out of 100, and even achieved a high score of 82 in never-before-seen flexible object manipulation (e.g., rope tightening).
  • Powerful fine-tuning cap: When fine-tuned, Wall-OSS-0.5 demonstrates extremely high learning efficiency. Under the same data budget, its average task progress is significantly ahead of industry benchmark models (e.g., $ \pi$0.5), and its success rate on certain precision operational tasks is even improved by nearly an order of magnitude.
  • Multimodal Cognition and Output: The model not only receives visual and verbal commands, but also outputs language and actions at the same time. It has excellent spatial understanding, causal reasoning, and reflective capabilities, and is able to autonomously disassemble steps and think step-by-step while performing tasks.
  • Retain and strengthen basic competencies: Instead of degrading the model's original graphic comprehension ability while learning high-intensity motor skills, the model achieves significant performance leaps on tasks such as embodied visual localization and placement reasoning.

Wall-OSSTechnical Principles

  1. Shared Attention + Triage of Experts (MoE) Architecture: Different from the traditional module splicing, it embeds linguistic, visual and action information into the same representation space, and realizes deep cross-modal interaction through the shared attention mechanism; at the same time, it utilizes the expert feed-forward network (FFN) to efficiently triage computation for different task requirements, which avoids knowledge forgetting and ensures the specialization of each modality.
  2. Gradient Bridging Co-training: The continuous actions of the robot are discretized into special “character tokens”, which are spliced with text tokens into the same sequence, and trained with the cross-entropy loss that is native to large models. This allows the supervisory signals of the actions to directly inverse shape the backbone network, allowing the model to truly understand the unity of “seeing, speaking, and moving” at the bottom.
  3. Three-stage training paradigm: The training path of “first discrete, then continuous, then joint” is adopted. The basic cognition is established through discrete actions in the inspiration phase, then continuous action modeling is focused in the integration phase, and finally joint optimization is performed. This approach ensures that the cognitive ability of the visual language model can be transferred to the physical action without loss.
  4. Chain-of-Thought (Chain of Thought) across hierarchical levels: The model internalizes a set of unified thought chain frameworks that can seamlessly switch from high-level semantic decision-making to bottom-level action control, and can autonomously plan and dynamically adjust strategies when facing unknown environments or unexpected situations.

Scenarios for using Wall-OSS

  • Family Services: Elderly care, housekeeping (e.g., folding towels, clearing the table), delivery of items.
  • Industrial Manufacturing: Precision assembly, parts sorting, assembly line collaboration.
  • Business Services: Hospitality services, logistics, warehousing and sorting, and superstore stocking.
  • Research and education: As a generalized embodied intelligence base for cutting-edge algorithm validation and secondary development in universities and research institutes.

Wall-OSS project address

  • GitHub code repository: https://github.com/X-Square-Robot/wall-x
  • Hugging Face model weights: https://huggingface.co/x-square-robot/wall-oss-0.5
  • Official program home page: https://x2robot.com/oss#resources

Recommended Reasons

  1. Truly “out of the box”: It is one of the very few open-source models that can be pre-trained and deployed in the real world with zero samples, greatly reducing the time and arithmetic costs for developers in data collection and fine-tuning.
  2. The ultimate value for money with a low threshold: With a parameter size of 4.2B and the ability to run on consumer-grade graphics cards, small and medium-sized teams, and even individual developers, are able to get their hands on the best in Embodied Intelligence technology.
  3. Transparent and hardcore strength: Wall-OSS took the second overall score in RoboChallenge, a world-renowned real machine benchmarking platform, surpassing many closed-source or foreign top models (e.g. $ \pi$0). Its completely open-source strategy (including training code, weights, optimizers, etc.) allows the technical strength to be tested under the sunlight with high credibility.
  4. Potential for ecological co-construction: Its open source is regarded as the “Android moment” in the field of body intelligence, providing the industry with a powerful and universal underlying operating system, which is ideal for developers and enterprises wishing to stand on the shoulders of giants to carry out scenario innovation.

data statistics

Related Navigation

No comments

none
No comments...