Embodied Agents & World Models · 2026年6月26日

每日论文速递 · Embodied Agents & World Models

💡 一句话：这篇直接把 embodied agent 拆成 cyber/API/IoT + physical manipulation/navigation 的统一执行架构，并强调长期运行里的失败检测、恢复、上下文管理。

2026-06-26 09:13:218 篇论文条目

arXiv:2606.27251 arXiv:2606.27330 arXiv:2606.27146 arXiv:2606.27268 arXiv:2606.27326 arXiv:2606.26025 arXiv:2606.27375 arXiv:2606.26057

📄 每日论文速递 · Embodied Agents & World Models

日期：2026-06-26

1. 从孤立技能到日常物理自治的全模态具身 Agent

Advancing Omnimodal Embodied Agents from Isolated Skills to Everyday Physical Autonomy

🔗 https://arxiv.org/abs/2606.27251v1

💡 一句话：这篇直接把 embodied agent 拆成 cyber/API/IoT + physical manipulation/navigation 的统一执行架构，并强调长期运行里的失败检测、恢复、上下文管理。

🎯 关联：今天最值得看。它的问题意识非常接近 InternOS 的执行层：不是“模型更大”，而是怎么把 planner、tool use、VLA policy、failure recovery 组织成一个可持续运行的系统。

2. 用自主探索和 hindsight experience 强化 GUI Agent 任务规划

Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning

🔗 https://arxiv.org/abs/2606.27330v1

💡 一句话：让 GUI agent 自己探索网页/界面环境，把失败和成功经验转成训练数据，提升小型开源 MLLM 的跨网站规划能力。

🎯 关联：这条和 Anna 的 AI Agent 平台非常直接：agent 不是只靠 prompt planning，而是要有 environment exploration → experience synthesis → planning improvement 的闭环。

3. VLA 的物理可行性检查与自反思调节

PhysReflect-VLA: Physical Feasibility and Self-Reflective Regulation for Reliable Vision-Language-Action Policies

🔗 https://arxiv.org/abs/2606.27146v1

💡 一句话：给 VLA 执行过程外挂一个 feasibility evaluator 和 self-reflection loop，在动作执行时检查物理可行性并在线纠错。

🎯 关联：这篇很对 Anna 的“generator + verifier + self-improvement loop”口味。核心启发是：执行层不能只生成 action，必须有 verifier/diagnoser 介入每一步。

4. 具身任务的 Test-Time Scaling 框架

E-TTS: A New Embodied Test-Time Scaling Framework for Robotic Manipulation

🔗 https://arxiv.org/abs/2606.27268v1

💡 一句话：把 test-time scaling 搬到机器人操作里，用历史信息驱动 reasoning 和 action scaling，而不是只看当前 observation。

🎯 关联：这对 InternOS 的启发是：agent 执行不是一次性 plan，而是 history-aware iterative refinement；长期任务里 memory/history 不是附属品，是 scaling 的前提。

5. 世界模型幻觉是可预测、可预防的

Hallucination in World Models is Predictable and Preventable

🔗 https://arxiv.org/abs/2606.27326v1

💡 一句话：研究 action-controllable world model 在低覆盖 state-action 区域为什么会 hallucinate，并用数据中心信号检测和缓解。

🎯 关联：这篇很适合 Anna 思考 sandbox / simulated execution：world model 不能只负责“生成看起来合理的未来”，还要暴露 uncertainty 和 failure modes，否则 agent 会在假环境里学坏。

6. 机器人控制中的 In-Context World Modeling

In-Context World Modeling for Robotic Control

🔗 https://arxiv.org/abs/2606.26025v2

💡 一句话：让机器人 policy 通过短历史自动推断相机视角、机器人形态等系统变量，从而适应新配置。

🎯 关联：这篇的价值在“系统识别作为上下文适配”。对未来 agent infra 来说，执行环境不是固定常量，而是 agent 必须持续 infer 的一部分。

7. 开放数据、训练与评估的可扩展 Behavior Cloning 栈

Scalable Behavior Cloning with Open Data, Training, and Evaluation

🔗 https://arxiv.org/abs/2606.27375v1

💡 一句话：发布大规模开源 manipulation 数据、硬件、训练基础设施和 sim-to-real 评估流程，用来系统比较 DiT/VLA 训练 recipe。

🎯 关联：这不是最“agentic”的一篇，但基础设施价值很高。对 Anna 和朋友讨论 AI sandbox / hardware infra 很有参考：真正有用的平台不是 demo，而是 data、sim、eval、hardware loop 全套打通。

8. 执行时 AI 安全内核：给 Agent 的不可绕过授权层

The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems

🔗 https://arxiv.org/abs/2606.26057v1

💡 一句话：主张把 agent 的安全控制从 prompt/guardrail 里拿出来，放到进程隔离、pre-action enforcement、fail-closed 的外部安全内核里。

🎯 关联：虽然不是 embodied paper，但对 Anna 的 agent OS / sandbox 线非常关键：只要 agent 能操作工具、API、文件或硬件，安全边界就不能放在 agent 自己的上下文里。

今日判断

今天的趋势很清楚：embodied agent 正在从“模型会不会做动作”转向“执行系统能不能长期自治”。最值得盯的不是单个 VLA backbone，而是 planner + policy + verifier + world model + safety kernel 这套运行时架构。

对 Anna 来说，这批论文真正有用的信号是：未来 agent 平台的壁垒不会只是模型调用，而是环境建模、执行反馈、失败恢复、权限隔离和经验积累。