Zhuoyang Liu

Zhuoyang Liu | 刘卓洋

Hi there! I'm a third-year undergraduate student at Peking University, majoring in Computer Science. Currently, I am a visiting student at BAIR at UC Berkeley, advised by Prof. Trevor Darrell. Prior to this, I was a research intern at the HMI Lab at Peking University, advised by Prof. Shanghang Zhang. My research interests lie in the application of multimodal large models, specifically VLA models, in robot manipulation.

Email / Scholar / Twitter / Github / WeChat

Research

I'm interested in Computer Vision, Robot Learning and Embodied Large Multimodal Models. My research focuses on how to enhance large multimodal models to better reason about the physical world and develop effective task planning. Some papers are highlighted.

	TwinRL: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation Qinwen Xu^, Jiaming Liu^, Rui Zhou^, Shaojun Shi^, Nuowei Han^, Zhuoyang Liu, Chenyang Gu, Shuo Gu, Yang Yue, Gao Huang, Wenzhao Zheng, Sirui Han, Peng Jia, Shanghang Zhang arXiv*, 2026 project page / arXiv TwinRL is a digital twin-real-world collaborative RL framework designed to scale and guide exploration for VLA models.
	LaST₀: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model Zhuoyang Liu^, Jiaming Liu^, Hao Chen^, Jiale Yu, Ziyu Guo, Chengkai Hou, Chenyang Gu, Xiangju Mi, Renrui Zhang, Kun Wu, Zhengping Che, Jian Tang, Pheng-Ann Heng, Shanghang Zhang arXiv*, 2026 project page / arXiv A VLA model that enables efficient reasoning before acting through a Latent Spatio-Temporal Chain-of-Thought (CoT), capturing fine-grained physical and robotic dynamics that are often difficult to verbalize.
	ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation Chenyang Gu^, Jiaming Liu^, Hao Chen^, Runzhong Huang^, Qingpo Wuwu, Zhuoyang Liu, Xiaoqi Li, Ying Li, Renrui Zhang, Peng Jia, Pheng-Ann Heng, Shanghang Zhang CVPR, 2026 project page / arXiv / video A unified VLA framework built upon a Mixture-of-Transformers (MoT) architecture, enabling coherent collaboration between multimodal manual generation and action execution.
	DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action Zhen Fang^, Zhuoyang Liu^, Jiaming Liu, Hao Chen, Yu Zeng, Shiting Huang, Zehui Chen, Lin Chen, Shanghang Zhang, Feng Zhao arXiv, 2025 project page / arXiv DualVLA improves action performance through carefully designed post-training while preserving the reasoning ability.
	MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation Zhuoyang Liu^, Jiaming Liu^, Jiadong Xu, Nuowei Han, Chenyang Gu, Hao Chen, Kaichen Zhou, Renrui Zhang, Kai Chin Hsieh, Kun Wu, Zhengping Che, Jian Tang, Shanghang Zhang ICRA, 2026 project page / arXiv / code A multisensory language-action (MLA) model that collaboratively perceives heterogeneous sensory modalities and predicts future multisensory objectives to facilitate physical world modeling.
	RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence Chengkai Hou^, Kun Wu^, Jiaming Liu^, Zhengping Che^, Di Wu^, Fei Liao^, Guangrun Li^, Jingyang He^, Qiuxuan Feng^, Zhao Jin^, Chenyang Gu, Zhuoyang Liu, Nuowei Han, Xiangju Mi, Yaoxu Lyu, Yankai Fu, Gaole Dai, Langzhe Gu, Tao Li, Yuheng Zhang, Yixue Zhang, Xinhua Wang, Shichao Fan, Meng Li, Zhen Zhao, Ning Liu, Zhiyuan Xu, Pei Ren, Junjie Ji, Haonan Liu, Kuan Cheng, Shanghang Zhang, Jian Tang arXiv, 2025 project page / arXiv / dataset RoboMIND 2.0 is a comprehensive real-world dataset comprising over 310K dual-arm manipulation trajectories collected across six distinct robot embodiments and 739 complex tasks.
	AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation Sixiang Chen^, Jiaming Liu^, Siyuan Qian^, Han Jiang, Xiaoqi Li, Renrui Zhang, Zhuoyang Liu, Chenyang Gu, Chengkai Hou, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang NeurIPS*, 2025 project page / arXiv / code Adaptive Coordination Diffusion Transformer (AC-DiT) enhances mobile base and manipulator coordination for end-to-end mobile manipulation.
	Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning Hao Chen^, Jiaming Liu^, Chenyang Gu^, Zhuoyang Liu^, Renrui Zhang, Xiaoqi Li, Xiao He, Yandong Guo, Chi-Wing FU, Shanghang Zhang, Pheng-Ann Heng NeurIPS, 2025 project page / arXiv / code Unlike previous dual-system VLA methods that attach a separate policy head as System 1, FiS-VLA repurposes the final transformer blocks of an intact VLM as System 1, while retaining the full model for System 2 reasoning.
	3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation Xiaoqi Li^, Liang Heng^, Jiaming Liu, Yan Shen, Chenyang Gu, Zhuoyang Liu, Hao Chen, Nuowei Han, Renrui Zhang, Hao Tang, Shanghang Zhang, Hao Dong CoRL, 2025 paper 3DS-VLA enhances pretrained 2D vision-language models (VLMs) with comprehensive 3D awareness, enabling the prediction of robust end-effector poses.
	HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model Jiaming Liu^, Hao Chen^, Zhuoyang Liu^, Pengju An^, Renrui Zhang, Chenyang Gu, Xiaoqi Li, Ziyu Guo, Sixiang Chen, Mengzhen Liu, Chengkai Hou, Mengdi Zhao, Kaichen Zhou, Pheng-Ann Heng, Shanghang Zhang ICLR, 2026 project page / arXiv / code HybridVLA innovatively integrates diffusion and autoregressive action prediction within a single LLM, fully leveraging the continuity and probabilistic nature of diffusion alongside the reasoning capabilities of autoregressive modeling.
	H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos Guangrun Li^, Yaoxu Lyu^, Zhuoyang Liu^, Chengkai Hou^, Jieyu Zhang, Shanghang Zhang CVPR workshop, 2025 project page / arXiv / dataset H2R is a simple data augmentation technique for robot pre-training from videos, which extracts the human hands from first-person videos and replaces them with those of different robots to generate new video data for pre-training.

Education

	University of California, Berkeley Visiting Student, Berkeley Global Access (BGA) Program 2026.01 - Present
	Peking University B.S. in Computer Science, Yuanpei College 2023.08 - Present

Research Experience

	Beijing Innovation Center of Humanoid Robotics Research Intern 2025.08 - 2026.01 Research on Embodied AI and Robot Manipulation
	AI2Robotics Research Intern, X-Lab 2025.06 - 2026.01 Focused on Vision-Language-Action (VLA) models
	Peking University Research Intern, HMI (Human Machine Intelligence) Lab 2024.07 - Present Embodied AI Research Advisor: Prof. Shanghang Zhang

Honors & Awards

2025	Academic Rising Star Award Nomination (Undergraduate Program), Peking University
2025	Third Prize in the first round of the RoboTwin Dual-Arm Collaboration Challenge, CVPR 2025
2024	Top 16 in the Mahjong AI Competition Finals, IJCAI 2024
2023	Qin-Jin Scholarship, Peking University

This homepage is designed based on Jon Barron's website and deployed on Github Pages.