Agent & LLM · 2026年6月13日

每日论文速递 · Agent & LLM

今日总结：这两天 arXiv 上 agent 方向出了一批质量很高的工作。两个大趋势：(1) 编排层的优化从规则驱动转向 reward-driven（OrchRM），(2) 记忆系统正在从存取走向演化跟踪（EvoArena、Infini Memory）。对 Anna 来说，OrchRM 的编排质量评估思路 + EvoMem 的 patch-based 记忆演化最值得深挖。

2026-06-13 09:05:027 篇论文条目

arXiv:2606.13598 arXiv:2606.13681 arXiv:2606.10677 arXiv:2606.13662 arXiv:2606.11869 arXiv:2606.13220 arXiv:2606.10749

Now I have all the information I need. Let me also check the agent security paper (2606.10749) date - submitted Jun 9, good. And let me replace Quine (too old) with the LLM-as-an-Investigator paper and the Secure LLM Agents survey. I have 7 solid, very recent papers. Let me compile the report.

📄 每日论文速递 · Agent & LLM

日期：2026-06-13

1. Reward Modeling for Multi-Agent Orchestration

（多智能体编排的奖励建模）

🔗 https://arxiv.org/abs/2606.13598

💡 一句话：提出 OrchRM，一个自监督框架，用 Bradley-Terry reward model 来评估多 agent 编排质量，不需要人工标注——训练效率提升 10x，准确率提升 8%。

🎯 关联：极高。Anna 做 InternOS 的核心挑战就是多 agent 协调调度。OrchRM 直接解决了"怎么评估编排器好不好"的问题，而且是在 orchestration level 而不是单 agent level 操作，这个思路可以直接用于 InternOS 的调度质量评估。

2. EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

（EvoArena：在动态环境中追踪记忆演化，让 Agent 更鲁棒）

🔗 https://arxiv.org/abs/2606.13681

💡 一句话：提出 EvoMem，一种基于 patch 的记忆范式，把 agent 的记忆演化记录为结构化的更新历史，让 agent 能理解环境是怎么变的，而不只是记住当前状态。

🎯 关联：很高。InternOS 管理的组织协调场景天然是动态的——人员变动、任务优先级调整、状态更新。EvoMem 的"记忆演化 = 结构化 patch 历史"这个抽象，跟 Anna 之前讨论的承诺跟踪机制高度契合。

3. Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent Memory

（Infini Memory：面向长期 Agent 记忆的可维护主题文档）

🔗 https://arxiv.org/abs/2606.10677

💡 一句话：把 agent 记忆组织成 topic-structured documents，新观察先缓冲再定期整合，检索时用 agentic retrieval（迭代式工具调用查记忆，不是一次性检索）。在 MemoryAgentBench 上达到 64.7%。

🎯 关联：很高。InternOS 的 memory kernel 需要的就是这种"可维护、可修订、按主题组织"的记忆架构。topic document + buffer + consolidation 的三层设计值得参考。

4. EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

（EurekAgent：环境工程是自主科学发现的关键）

🔗 https://arxiv.org/abs/2606.13662

💡 一句话：提出"环境工程"概念——不要只优化 agent workflow，要设计好 agent 运行的环境（权限、artifact 管理、预算、人机交互）。在数学和 ML 任务上刷新 SOTA，发现新的 26 圆堆积结果只花了 $11。

🎯 关联：高。Anna 做 Agent 平台的核心就是设计 agent 的运行环境。EurekAgent 把 permissions engineering、artifact engineering、budget engineering 拆成四个维度，这个分类框架可以直接映射到 InternOS 的系统设计上。

5. Agents All the Way Down: A Methodology for Building Custom AI Agents from Substrate to Production

（一路到底的 Agent：从底层到生产的自定义 AI Agent 构建方法论）

🔗 https://arxiv.org/abs/2606.11869

💡 一句话：把散落在博客和播客里的 agent 构建实践写成了一套方法论——两个前置条件（LLM-as-substrate + function calling/MCP/CLI），三个循环实践（原型→收割→agent-tests-agent）。核心论点：多 agent 编排就是 CLI 组合。

🎯 关联：高。"multi-agent orchestration is just CLI composition" 这个观点跟 Anna 的 InternOS 哲学（7 Kernel、用 POSIX 隐喻管理 agent）完全同频。方法论中的 Turtle pattern（prototype → harvest → ship as CLI）也是可以直接借鉴的工程模式。

6. LLM-as-an-Investigator: Evidence-First Reasoning for Robust Interactive Problem Diagnosis

（LLM 当侦探：证据优先的交互式问题诊断）

🔗 https://arxiv.org/abs/2606.13220

💡 一句话：解决 LLM 的"用户驱动的谄媚"问题——用户给了个似是而非的假设，LLM 就直接顺着走了。提出 evidence-first 方法：先生成候选假设，通过提问收集证据，更新概率，证据够了再下结论。

🎯 关联：中高。InternOS 场景中 agent 经常要处理模糊的用户意图和不完整信息。evidence-first 的推理范式对 agent 的 diagnostic 能力很有参考价值，避免 agent 过早承诺错误的执行路径。

7. Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

（迈向安全的 LLM Agent：威胁面、攻击、防御与评估）

🔗 https://arxiv.org/abs/2606.10749

💡 一句话：综合 247 篇论文的系统性综述，从信息流、委托权限、持久状态三个维度建模 agent 安全。核心发现：prompt injection 和工具控制流劫持仍是主要威胁，持久状态腐蚀和多 agent 传播是新兴风险。

🎯 关联：中高。Anna 做 Agent 平台绕不过安全问题。这篇综述的"lifecycle-based, systems-oriented"分析框架——特别是 trust boundary、privilege control、provenance-aware state management 这几个维度——正好是 InternOS 安全架构设计需要参考的。

今日总结：这两天 arXiv 上 agent 方向出了一批质量很高的工作。两个大趋势：(1) 编排层的优化从规则驱动转向 reward-driven（OrchRM），(2) 记忆系统正在从"存取"走向"演化跟踪"（EvoArena、Infini Memory）。对 Anna 来说，OrchRM 的编排质量评估思路 + EvoMem 的 patch-based 记忆演化最值得深挖。