Robot foundation models keep hiding behind fine-tuning numbers. Wall-OSS-0.5 is trying a different approach

r/artificial
Machine Learning Generative AI Robotics AI Research

Most robot foundation model s are hard to interpret because the impressive number usually comes after task-specific fine tuning. Wall-OSS-0.5, a new open-source VLA release from X Square Robot, is interesting because the report tries to measure what the pretrained checkpoint can do before that extra adaptation step. The setup is a 4B vision-language-action model built around a 3B VLM backbone plus action-generation components. According to the report, the pretrained checkpoint was evaluated on a 17-task real-robot suite without task-specific fine tuning.