2026年4月29日 星期三

World Action Models 世界行動模型

 Current top-performing Vision-Language-Action models are good at understanding and generalizing across different semantic concepts, but they tend to fall short when encountering unfamiliar physical movements in new environments. This paper presents DreamZero, a new type of model called a World Action Model, which is built on top of a pre-trained video diffusion system. Rather than following the VLA approach, DreamZero learns how the physical world works by predicting what future states and actions will look like, treating video as a rich signal for how things change over time. By learning from both video and action data together, the model can pick up a wide variety of skills from mixed and diverse robot datasets without needing lots of repeated examples. In real-world robot tests, this leads to more than double the performance gains on new tasks and settings compared to leading VLA methods. The team also made significant engineering and algorithmic improvements that allow this large 14-billion-parameter model to run fast enough for real-time robot control at 7 frames per second. Beyond that, the paper shows two ways the model can transfer across different robot bodies: using video demonstrations from other robots or humans brings over a 42% relative boost on unseen tasks with only 10 to 20 minutes of data; and the model can adapt to an entirely new robot body with just 30 minutes of free-play data, while still performing well on tasks it has never explicitly trained for.

目前頂尖的視覺語言動作模型(VLA)擅長跨語義概念的理解與泛化,但在面對新環境中陌生的物理動作時往往表現不佳。本論文提出 DreamZero,一種新型模型,稱為世界動作模型(WAM),建構於預訓練的視頻擴散系統之上。與 VLA 的方式不同,DreamZero 透過預測未來的世界狀態與動作來學習物理世界的運作規律,將視頻視為世界隨時間演變的豐富訊號。藉由同時從視頻與動作資料中共同學習,該模型能夠從多樣化的異質機器人資料集中習得各式各樣的技能,而無需大量重複性示範。在真實機器人實驗中,相較於頂尖的 VLA 方法,DreamZero 在新任務與新環境上的表現提升超過兩倍。研究團隊同時在演算法與系統層面進行了大幅優化,使這個擁有 140 億參數的大型模型能夠以每秒 7 幀的速度執行實時機器人控制。此外,本論文展示了兩種跨機器人本體的遷移方式:利用來自其他機器人或人類的純視頻示範,僅需 10 至 20 分鐘的資料便能在未見任務上帶來超過 42% 的相對性能提升;更令人驚喜的是,模型僅需 30 分鐘的自由探索資料即可適應全新的機器人本體,同時仍能在從未明確訓練過的任務上保持出色的零樣本泛化能力。


沒有留言:

張貼留言