Dense future frames add redundant appearance modeling before the robot can act.
Robot learning project
LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies
LaWAM gives robot policies physical foresight by predicting compact latent visual subgoals, not future pixels.
Why Latent Futures
Video imagination slows robot control.
Pixel-space world-action models can provide foresight, but they spend latency and model capacity on reconstructing visual detail. LaWAM moves future prediction into a frozen visual feature space, where a single latent subgoal captures the scene change needed for the next action chunk.
A compact future feature directly conditions action generation in one forward pass.
Method
Predict a latent subgoal, then act toward it.
LaWAM repurposes a latent action model decoder as a Latent World Model, then inserts the predicted future feature into a VLA action expert.
Learn LaWM
Encode current and horizon observations with a frozen visual encoder, infer latent actions, and train a decoder to predict future features.
Distill subgoals
Teach the policy prior to predict latent actions that drive LaWM toward teacher latent subgoals from robot trajectories.
Generate actions
At test time, one latent world-model pass produces the subgoal used by the action expert for chunk-level control.
Three-Minute Overview
Watch the LaWAM project video.
Results
High success with low-latency latent prediction.
Across simulated and physical manipulation tasks, LaWAM keeps the predictive benefits of world-action modeling while avoiding the cost of pixel-space rollouts.
| Method | Model | Latency | Avg. SR |
|---|---|---|---|
| pi0.5 | 3.5B | 220 ms | 96.9 |
| Cosmos-Policy | 2.1B | 1413 ms | 98.5 |
| LingBot-VA | 5.5B | 4482 ms | 98.5 |
| LaWAM | 2.3B | 187 ms | 98.6 |
RoboTwin
Strong bimanual generalization over 50 manipulation tasks with 100 trials per task.
Real-World Transfer
First across pick-and-place, drawer opening, and towel folding in 30 physical trials per task.
Dynamics Analysis
Shared latent transitions ground across embodiments.
Applying the same latent action trajectory to different initial observations produces context-specific latent rollouts, suggesting that LaWM grounds abstract transitions in the current embodiment and scene.
Citation
Paper and citation.
@misc{chen2026lawam,
title = {LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies},
author = {Chen, Jialei and Wang, Kai and Chen, Kang and Chen, Shuaihang and Gao, Feng and Tang, Wenhao and Li, Zhiyuan and Liu, Weilin and Yao, Zhuyu and Li, Boxun and Xu, Yuanbo and Yu, Chao},
year = {2026},
note = {Manuscript in preparation}
}