Embodied Agents & World Models · 2026年6月29日

每日论文速递 · Embodied Agents & World Models

💡 一句话：OmniAct 把 cyber action、IoT、机器人导航/操作、记忆压缩、异步视觉验证放进一个分层 agent 架构里，目标是长时间真实环境自主执行。

2026-06-29 09:14:008 篇论文条目

arXiv:2606.27251 arXiv:2606.28182 arXiv:2606.27268 arXiv:2606.27146 arXiv:2606.28276 arXiv:2606.28128 arXiv:2606.27677 arXiv:2606.27355

📄 每日论文速递 · Embodied Agents & World Models

日期：2026-06-29

1. 从孤立技能到日常物理自治：推进全模态具身智能体

Advancing Omnimodal Embodied Agents from Isolated Skills to Everyday Physical Autonomy

🔗 https://arxiv.org/abs/2606.27251

💡 一句话：OmniAct 把 cyber action、IoT、机器人导航/操作、记忆压缩、异步视觉验证放进一个分层 agent 架构里，目标是长时间真实环境自主执行。

🎯 关联：今天最值得看。它不是单点 robot policy，而是 planning / memory / verification / execution 的完整闭环，和 InternOS 的“组织协调 + 执行反馈”非常贴。

2. 学习协作法则：建模具身多智能体行为

LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior

🔗 https://arxiv.org/abs/2606.28182

💡 一句话：让 embodied multi-agent 从失败里总结“协作法则”，再把这些规则注入推理过程，提高多 agent 在部分可观测环境里的协作效率。

🎯 关联：这篇对 InternOS 很有启发：不是只做 task decomposition，而是把历史失败沉淀成可复用的 coordination law，像组织系统里的“协作协议学习”。

3. E-TTS：机器人操作的具身 Test-Time Scaling 框架

E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

🔗 https://arxiv.org/abs/2606.27268

💡 一句话：把 reasoning-action joint sampling、history buffer、vision-language verifier 和 feedback refinement 组合起来，在推理时提升 VLA 执行成功率。

🎯 关联：这就是 generator + verifier + self-improvement loop 的具身版本；对 Anna 做 agent 执行层很关键——不要只让 agent “想一步做一步”，要在执行时采样、评分、修正。

4. PhysReflect-VLA：带物理可行性检查和自反思调节的可靠 VLA

PhysReflect-VLA: Physical Feasibility and Self-Reflective Regulation for Reliable Vision-Language-Action Policies

🔗 https://arxiv.org/abs/2606.27146

💡 一句话：给 VLA 加上 physical feasibility evaluator、action explanation operator 和 LLM reflection module，让机器人在执行中发现不合理动作并自我修正。

🎯 关联：这篇的价值不在“又一个 VLA”，而在 verifier 设计：执行前检查物理可行性，执行后解释偏差，再反馈下一步动作。这个 pattern 可以直接迁移到 sandbox / agent runtime。

5. SimFoundry：用于策略学习与评估的模块化自动场景生成

SimFoundry: Modular and Automated Scene Generation for Policy Learning and Evaluation

🔗 https://arxiv.org/abs/2606.28276

💡 一句话：从视频自动构建 sim-ready digital twin，并生成保持 affordance 的“digital cousins”，用于训练和评估机器人策略。

🎯 关联：对 AI sandbox / hardware infra 线很相关：未来 agent 的评测环境不能靠人工写几个 toy benchmark，而是要自动生成可变体、可回放、可对照的环境族。

6. PhysisForcing：面向机器人操作的物理强化世界模拟器

PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

🔗 https://arxiv.org/abs/2606.28128

💡 一句话：针对视频 world model 容易生成物理不连续、接触关系不合理的问题，用轨迹对齐和语义关系对齐强化物理一致性。

🎯 关联：world model 如果不能可靠预测“动作之后环境会怎样”，就没法做 serious planning。这篇说明 embodied world model 的核心瓶颈已经从画面真实转向物理一致性。

7. DIM-WAM：带多尺度历史事件记忆的 World-Action Model

DIM-WAM: World-Action Modeling with Diverse Historical Event Memory

🔗 https://arxiv.org/abs/2606.27677

💡 一句话：给 world-action model 加入长期事件记忆和任务进度监督，解决长任务里只看短历史导致的遗忘问题。

🎯 关联：这和 InternOS 的 memory / state tracking 很像：执行系统不能只保留最近上下文，必须把“已完成阶段、关键事件、当前进度”结构化成可读 memory。

8. RouterVLA：把 Smoke Test 变成异构 VLA 策略选择监督

RouterVLA: Turning Smoke Tests into Supervision for Heterogeneous VLA Selection

🔗 https://arxiv.org/abs/2606.27355

💡 一句话：用预部署 smoke test 的结果来选择最适合当前任务的 VLA expert，并强调评估 ledger 必须做 outcome separation，避免虚高收益。

🎯 关联：这篇很工程化但重要：未来 agent 平台不会只有一个万能模型，而是 router + expert pool + evaluation ledger；这和 Anna 做 agent infra 的模型选择、执行前验证非常贴。

今日判断

今天的趋势很明确：具身智能正在从“训练一个更强 policy”转向“搭一个能长期运行的执行系统”。planning、memory、verifier、world model、policy routing、sim sandbox 这些模块开始合流。 blunt 一点说，单纯 VLA scaling 已经不够性感了；真正有价值的是能不能把 agent 放进环境里，让它执行、检查、失败恢复、积累经验。