

【周末特辑】7月第1周最火AI论文 | Orca:从视频中学习世界模型;智能体弃权:学会何时停止。【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 5 篇论文如下: [00:43] TOP1(🔥230) | 🌍 Orca: The World is in Your Mind(虎鲸:世界在你心中) [03:18] TOP2(🔥141) | 🤔 Agentic Abstention: Do Agents Know When to Stop Instead of Act?(智能体式弃权:智能体知道何时该停止而非行动吗?) [05:32] TOP3(🔥103) | 🧪 Dockerless: Environment-Free Program Verifier for Coding Agents(无Docker:面向编码智能体的无环境程序验证器) [07:58] TOP4(🔥93) | 🎭 DOPD: Dual On-policy Distillation(双在线策略蒸馏) [10:13] TOP5(🔥86) | 🧠 Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent(扩展智能体视野而非参数规模:以35B智能体达到万亿参数级性能) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.07.03 | 小模型本地化击败大模型;自主策略演化聚焦结构合成【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 15 篇论文如下: [00:31] 🧩 Program-as-Weights: A Programming Paradigm for Fuzzy Functions(程序即权重:面向模糊函数的编程范式) [01:24] 🧠 EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments(EvoPolicyGym:在交互环境中评估自主策略演化) [02:24] 🧠 AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents(AgenticSTS:面向长时程LLM智能体的有界内存测试平台) [03:18] 🔍 Morphing into Hybrid Attention Models(变形为混合注意力模型) [04:07] 📊 AgenticDataBench: A Comprehensive Benchmark for Data Agents(AgenticDataBench:面向数据智能体的综合性基准测试) [05:12] ⚡ Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling(多分辨率流匹配:通过分阶段采样的无训练扩散加速) [05:52] 🎬 WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory(世界导演:构建具有持久动态记忆的可控世界模拟器) [06:49] 🏥 Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning(打破失败级联:面向医学多模态推理的步骤感知强化学习) [07:37] 🎨 Optimizing Visual Generative Models via Distribution-wise Rewards(通过分布级奖励优化视觉生成模型) [08:32] 🎯 SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use(SkillCoach:用于评估和增强智能体技能使用的自我演化评分标准) [09:21] 🖐 AGVBench: A Reliability-Oriented Benchmark of Data Augmentation for Vein Recognition(AGVBench:面向静脉识别的可靠性导向数据增强基准) [10:16] 🔬 From SRA to Self-Flow: Data Augmentation or Self-Supervision?(从SRA到Self-Flow:数据增强还是自监督?) [11:10] 🧠 Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads(对数几率贡献评分识别非字面检索头) [11:58] 🎯 AnyGroundBench: A Specialized-Domain Benchmark for Video Grounding in Vision-Language Models(AnyGroundBench:面向视觉语言模型中视频定位的专业领域基准) [12:46] 🤔 When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search(搜索代理何时应提问:面向澄清感知的深度搜索基准DiscoBench) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.07.02 | 感知准则揭示AI可靠性鸿沟;TurboServe实现流式视频生成降本增效【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 15 篇论文如下: [00:33] 🔍 PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception(感知准则:将多模态评估校准至人类感知) [01:34] 🎬 TurboServe: Serving Streaming Video Generation Efficiently and Economically(TurboServe:高效经济地提供流式视频生成服务) [02:30] 🧠 MemSyco-Bench: Benchmarking Sycophancy in Agent Memory(MemSyco-Bench:评估智能体记忆中的谄媚行为基准) [03:26] 🚀 ELDR: Expert-Locality-Aware Decode Routing for PD-Disaggregated MoE Serving(ELDR:面向PD分离式MoE服务的高效专家局部感知解码路由) [04:25] 🧮 Domain Arithmetic: One-Shot VLA Adaptation under Environmental Shifts(领域算术:环境变化下的单样本视觉-语言-动作模型适配) [05:15] 🧠 Multimodal Continuous Reasoning via Asymmetric Mutual Variational Learning(基于非对称互变分学习的多模态连续推理) [06:05] 🔬 CausalMix: Data Mixture as Causal Inference for Language Model Training(因果混合:将数据混合视为语言模型训练的因果推断问题) [07:04] 🔍 Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning(感知到推理:解耦感知与推理以进行细粒度视觉推理) [07:58] 🤖 ABot-M0.5: Unified Mobility-and-Manipulation World Action Model(ABot-M0.5:统一的移动与操作世界动作模型) [08:49] 🌱 Seed2.0 Model Card: Towards Intelligence Frontier for Real-World Complexity(Seed2.0 模型卡:迈向应对真实世界复杂性的智能前沿) [09:41] 🤖 ASPIRE: Agentic /Skills Discovery for Robotics(ASPIRE:面向机器人的智能体技能探索与自主编程) [10:28] 🧬 BioInsight: Multi-Agent Orchestration for Interactive Biomedical Knowledge Discovery(生物洞察:面向交互式生物医学知识发现的多智能体编排系统) [11:31] 🔀 The State-Prediction Separation Hypothesis(状态预测分离假说) [12:24] 🤖 AutoTrainess: Teaching Language Models to Improve Language Models Autonomously(AutoTrainess:教语言模型自主改进语言模型) [13:20] 🌍 Valdi: Value Diffusion World Models(Valdi:价值扩散世界模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.07.01 | Orca世界模型验证状态预测范式;Dockerless实现无环境代码验证【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 15 篇论文如下: [00:33] 🌍 Orca: The World is in Your Mind(虎鲸:世界在你心中) [01:30] 🧪 Dockerless: Environment-Free Program Verifier for Coding Agents(无Docker:面向编码智能体的无环境程序验证器) [02:22] 🎭 DOPD: Dual On-policy Distillation(双在线策略蒸馏) [03:20] 🚀 BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding(BlockPilot:基于扩散的推测解码的实例自适应策略学习) [04:09] 🧩 Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views(场景即对象,而非基元:基于未标定视图的实例结构化3D分词化) [05:00] 🎨 GEAR: Guided End-to-End AutoRegression for Image Synthesis(GEAR:引导式端到端自回归图像合成) [05:59] 🧩 Multi-Block Diffusion Language Models(多块扩散语言模型) [06:42] 🧬 Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks(进化微调:学习在371个优化任务中发现) [07:33] 🔧 SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History(SkillHone:基于持久决策历史的持续智能体技能演进框架) [08:22] 🎥 MemLearner: Learning to Query Context memory for Video World Models(MemLearner:学习为视频世界模型查询上下文记忆) [09:13] 🧠 Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation(LLM智能体中的程序性记忆管理:控制、适应与评估) [10:08] 🧠 DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation(DataEvolver:面向文本丰富图像生成的自我进化多智能体数据构建框架) [11:10] 🔊 RedVox: Safety and Fairness Gaps in Speech Models Across Languages(RedVox:跨语言语音模型中的安全性与公平性差距) [12:00] 🧠 Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs(基于元认知反馈的强化学习激发大语言模型可靠的置信度表达) [12:56] 🧠 Little Brains, Big Feats: Exploring Compact Language Models(小大脑,大成就:探索紧凑型语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.06.30 | 实时编辑视频流;巧用视野胜参数。【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 15 篇论文如下: [00:32] 🎬 LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing(LiveEdit:迈向基于实时扩散的流式视频编辑) [01:20] 🧠 Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent(扩展智能体视野而非参数规模:以35B智能体达到万亿参数级性能) [02:15] 🤔 Agentic Abstention: Do Agents Know When to Stop Instead of Act?(智能体式弃权:智能体知道何时该停止而非行动吗?) [03:09] 💻 TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents(TUA-Bench:面向通用终端操作代理的基准测试) [04:12] 🗿 Trimming the Long-Tail of Visual World Modeling Evaluation(修剪视觉世界模型评估中的长尾分布) [05:06] 🧠 Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning(视频MME-逻辑:一个用于视频时序逻辑推理的受控诊断基准) [05:57] 📊 Beyond IID: How General Are Tabular Foundation Models, Really?(超越独立同分布:表格基础模型的泛化能力究竟如何?) [06:40] 🏭 AsyncOPD: How Stale Can On-Policy Distillation Be?(异步OPD:策略蒸馏可以容忍多旧的数据?) [07:38] 🧠 ReFreeKV: Towards Threshold-Free KV Cache Compression(ReFreeKV:迈向无阈值KV缓存压缩) [08:38] 📱 Monte Carlo Energy Aggregation for Mobile 3D Gaussian Splatting(面向移动端三维高斯泼溅的蒙特卡洛能量聚合方法) [09:39] 🔧 TACO: Tool-Augmented Credit Optimization for Agentic Tool Use(工具增强信用优化:面向智能体工具使用的GRPO变体) [10:33] 🎥 Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction(通过广义关键帧提取桥接视频问答与视频引导的智能体任务) [11:36] 🔍 Interleaved Speech Language Models Latently Work In Text(交错式语音语言模型在文本中潜在地工作) [12:27] 🤖 OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks(OSWorld2.0:面向长时间跨度的真实世界任务的计算机使用智能体基准测试) [13:15] 🌍 DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model(DreamForge-World 0.1 预览版:一种低计算量、实时可控的世界模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.06.29 | 物理强化破解模拟瓶颈;手腕平移架起人机桥梁【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 15 篇论文如下: [00:31] 🤖 PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation(物理强化:面向机器人操作的物理增强世界模拟器) [01:28] 🤖 Translation as a Bridging Action: Transferring Manipulation Skills from Humans to Robots(翻译作为桥梁动作:将操作技能从人类迁移到机器人) [02:13] 🎨 Qwen-Image-2.0-RL Technical Report(Qwen-Image-2.0-RL技术报告) [03:01] 🔑 MultiHashFormer: Hash-based Generative Language Models(MultiHashFormer:基于哈希的自回归语言模型) [03:51] 🧠 Formalizing Latent Thoughts: Four Axioms of Thought Representation in LLMs(形式化潜在思维:大语言模型中思想表示的四条公理) [04:34] 🛡 SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning(SingGuard:一种具有动态推理能力的策略自适应多模态大语言模型护栏) [05:30] 🔍 ProMSA:Progressive Multimodal Search Agents for Knowledge-Based Visual Question Answering(ProMSA:渐进式多模态搜索代理用于基于知识的视觉问答) [06:36] 🛡 The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar(Tatoxa系统用于低资源语言文本去毒化:以鞑靼语为例) [07:30] 🤖 SimFoundry: Modular and Automated Scene Generation for Policy Learning and Evaluation(SimFoundry:用于策略学习与评估的模块化自动化场景生成) [08:30] 🧠 GBC: Gradient-Based Connections for Optimizing Multi-Agent Systems(GBC:基于梯度的连接优化多智能体系统) [09:31] 👕 Learning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline)(学习折叠:LeHome挑战赛2026的获奖解决方案(线上第一名,线下第二名)) [10:27] 🔍 Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents(Ko-WideSearch:用于网络代理穷举集合枚举的韩语广度搜索基准) [11:20] 🗣 Thinking While Speaking: Inference-Time Knowledge Transfer for Responsive and Intelligent Conversational Voice Agents(边思考边说话:面向响应式与智能对话语音代理的推理时知识迁移) [12:14] 🎨 Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation(像素空间自回归图像生成的并行 rollout 近似方法) [13:06] 🛡 NormGuard: Reward-Preserving Norm Constraints in Flow-Matching Reinforcement Learning(NormGuard:流匹配强化学习中保持奖励的范数约束) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
【周末特辑】6月第5周最火AI论文 | 分层记忆驱动AI做PPT;语言世界模型预演智能决策【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 5 篇论文如下: [00:53] TOP1(🔥159) | 🎨 MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision(MemSlides:一种面向个性化幻灯片生成与多轮局部修订的分层记忆驱动智能体框架) [02:58] TOP2(🔥136) | 🌍 Qwen-AgentWorld: Language World Models for General Agents(Qwen-AgentWorld:面向通用智能体的语言世界模型) [05:13] TOP3(🔥104) | 🧠 Are We Ready For An Agent-Native Memory System?(我们准备好构建智能体原生内存系统了吗?) [08:02] TOP4(🔥95) | 🧩 PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems(PlanBench-XL:评估大语言模型工具使用智能体在大型工具生态系统中的长时域规划能力) [10:18] TOP5(🔥91) | ⚡ Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models(Wan-Streamer v0.1:端到端实时交互基础模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.06.26 | DanceOPD融合策略蒸馏;ViQ突破量化表示极限【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 15 篇论文如下: [00:32] 💃 DanceOPD: On-Policy Generative Field Distillation(DanceOPD:基于策略的生成场蒸馏) [01:34] 🔍 ViQ: Text-Aligned Visual Quantized Representations at Any Resolution(ViQ:任意分辨率下的文本对齐视觉量化表示) [02:32] 🎨 Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation(Qwen-Image-Agent:弥合真实世界图像生成中的上下文鸿沟) [03:25] 🤖 OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning(基于策略的技能蒸馏:面向智能体强化学习的在线学习方法) [04:21] 🔍 The Verification Horizon: No Silver Bullet for Coding Agent Rewards(验证地平线:编码智能体奖励没有银弹) [05:04] 🚀 JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting(JetSpec:利用并行树草稿打破推测解码的扩展上限) [06:09] 🛠 Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It(为什么多步骤工具使用强化学习会崩溃以及监督信号如何修复它) [06:57] 🧩 Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments(穿越障碍:重新评估智能体在陌生环境之外的能力) [07:50] 🔍 Confidence-Aware Tool Orchestration for Robust Video Understanding(面向鲁棒视频理解的置信感知工具编排) [09:10] 🖥 GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents(图形用户界面与命令行界面:纯屏幕操作与技能中介的计算机使用代理中的执行瓶颈) [10:11] 🤖 In-Context World Modeling for Robotic Control(面向机器人控制的上下文世界建模) [10:59] 🚀 Fast LeWorldModel(快速潜在世界模型) [11:47] ☕ CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies(咖啡基准:异构多智能体经济中长周期LLM代理的基准测试) [12:41] 🧊 PhysiFormer: Learning to Simulate Mechanics in World Space(PhysiFormer:在世界空间中学习模拟力学) [13:34] 🎲 Discretizing Reward Models(离散化奖励模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.06.25 | 智能体记忆尚待完善;主体穿越与场景解耦。【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 15 篇论文如下: [00:33] 🧠 Are We Ready For An Agent-Native Memory System?(我们准备好构建智能体原生内存系统了吗?) [01:25] 🎥 DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation(DomainShuttle:自由形式开放域主题驱动的文生视频生成) [02:14] 📸 ShutterMuse: Capture-Time Photography Guidance with MLLMs(ShutterMuse:基于多模态大语言模型的拍摄时摄影指导) [03:13] ⚡ Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models(Wan-Streamer v0.1:端到端实时交互基础模型) [04:13] 🧠 Improved Large Language Diffusion Models(改进的大型语言扩散模型) [05:17] 🧑 Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence(超越NL2Code:多模态代码智能的结构化综述) [06:21] 🎥 MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation(MVTrack4Gen:多视角点跟踪作为4D视频生成的几何监督) [07:14] 🔍 V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning(V-Zero:基于对比证据门控的无答案标签在线策略蒸馏用于细粒度视觉推理) [08:27] 🎬 UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating(UnityShots:基于记忆驱动与边界感知门控的多镜头音视频生成) [09:34] 🧠 IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation(隐式视觉思维链:面向结构感知文本到图像生成的潜在视觉推理框架) [10:40] 🔧 EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies(EBench:通用移动操作策略的要素诊断) [11:39] 🎥 Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models(因果-rCM:自回归扩散蒸馏中统一教师强制与自我强制的开放方案,用于流式视频生成与交互式世界模型) [12:38] 🤖 The Hitchhiker's Guide to Agentic AI: From Foundations to Systems(《银河系漫游指南:从基础到系统的智能体AI》) [13:31] 🤖 Autodata: An agentic data scientist to create high quality synthetic data(Autodata:一种创建高质量合成数据的智能数据科学家代理) [14:22] 🧠 Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do(目光轻浅,思考深沉:多模态链式思维推理能做什么与不能做什么) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.06.24 | Qwen-AgentWorld超越GPT-5.4;NatureBench揭示AI创新瓶颈【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 15 篇论文如下: [00:31] 🌍 Qwen-AgentWorld: Language World Models for General Agents(Qwen-AgentWorld:面向通用智能体的语言世界模型) [01:28] 🧪 NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?(NatureBench:编码智能体能否复现《自然》系列论文的已发表SOTA?) [02:24] 🤖 AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction(AOHP:面向个性化、高效与安全交互的开源操作系统级智能体框架) [03:16] 📱 MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization(移动锻造:基于分层反馈引导策略优化的无标注移动GUI智能体自适应方法) [04:04] 🤖 MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management(MemGUI-Agent:一种具有主动上下文管理的端到端长时移动GUI智能体) [04:57] 🧠 LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis(灵犀诊断基准:用于评估大语言模型中文精神科咨询与诊断能力的多智能体框架) [06:01] 🔒 FedOT: Ownership Verification and Leakage Tracing via Watermarks for Federated LDMs(联邦扩散模型的所有权验证与泄漏追踪水印方法) [06:57] 🧠 OpenThoughts-Agent: Data Recipes for Agentic Models(开放思维智能体:用于智能体模型的数据配方) [08:03] 🤖 Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning(逃离自我确认陷阱:一种用于智能体经验学习的执行-提炼-验证范式) [09:02] 🔺 FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation(FLAT:用于几何精确场景生成的前馈潜在三角面片喷溅) [09:58] 🦃 Are Text-to-Image Models Inductivist Turkeys? A Counterfactual Benchmark for Causal Reasoning(文本到图像模型是归纳主义的火鸡吗?一个用于因果推理的反事实基准) [11:03] 🧪 DiffusionBench: On Holistic Evaluation of Diffusion Transformers(DiffusionBench:扩散变换器的全面评估基准) [11:49] 🚗 FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning(FlowR2A:学习多模态驾驶规划中奖励到动作的分布) [12:53] 🔍 DREAM: Dense Retrieval Embeddings via Autoregressive Modeling(DREAM:通过自回归建模实现密集检索嵌入) [13:49] 🔍 ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection(ReMMD:面向多模态 misinformation 检测的真实多语言多图像智能验证框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.06.23 | 工具生态暴露规划短板;会话中心构建可审计系统【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 15 篇论文如下: [00:32] 🧩 PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems(PlanBench-XL:评估大语言模型工具使用智能体在大型工具生态系统中的长时域规划能力) [01:31] 🧠 OpenRath: Session-Centered Runtime State for Agent Systems(OpenRath:面向智能体系统的会话中心运行时状态) [02:28] 🧩 DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams(DataClaw0:从原始流中智能裁剪多模态数据) [03:19] 💼 EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions(企业爪痕基准:从真实工作会话中构建的智能体评估) [04:10] 🧠 Grouped Query Experts: Mixture-of-Experts on GQA Self-Attention(分组查询专家:基于分组查询自注意力的混合专家模型) [05:09] ⚡ KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking(KaLM-Reranker-V1:用于压缩文档重排序的快速但非延迟交互方法) [06:05] 🌍 World Action Models: A Survey(世界行动模型:一项综述) [07:10] 🧪 CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents(CLI-Universe:面向终端智能体的可验证任务合成引擎) [08:07] 🧬 EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory(EvoEmbedding:面向长上下文检索与智能体记忆的可进化表示) [09:06] 🧬 BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language(BioMatrix:迈向涵盖序列、结构和语言模态矩阵的综合性生物基础模型) [10:07] 🧠 HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization(HydraHead:从头级功能异质性到专业化注意力混合) [10:57] 🎯 Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation(从自身错误中学习:为自蒸馏构建可学习的微反思轨迹) [11:53] 🛡 SkillHarness: Harnessing Safe Skills for Computer-Use Agents(SkillHarness:为计算机使用代理安全地驾驭技能) [12:42] 🧠 Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding(更深并非总是更好:通过置信层解码减轻对齐代价) [13:41] 🔬 Deep Research in Physical Sciences: A Multi-Agent Framework and Comprehensive Benchmark(物理科学中的深度研究:一个多智能体框架与综合基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.06.22 | 感知扩散模型提速三倍;记忆驱动框架精准修订幻灯片【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 14 篇论文如下: [00:33] 🚀 PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models(感知扩散语言模型:基于多模态扩散语言模型的并行区域感知) [01:22] 🎨 MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision(MemSlides:一种面向个性化幻灯片生成与多轮局部修订的分层记忆驱动智能体框架) [02:24] 🧠 GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents(GateMem:多主体共享内存智能体的记忆治理基准测试) [03:24] 🧭 MCompassRAG: Topic Metadata as a Semantic Compass for Paragraph-Level Retrieval(MCompassRAG:主题元数据作为段落级检索的语义指南针) [04:18] 🔄 Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models(多轮反射掩码激发掩码扩散模型的推理能力) [05:22] 🌳 SproutRAG: Attention-Guided Tree Search with Progressive Embeddings for Long-Document RAG(SproutRAG:基于注意力引导的树搜索与渐进式嵌入的长文档检索增强生成) [06:18] 🧠 BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation(BrainG3N:一种用于可控3D脑MRI生成的双用途分词器) [07:18] 🌍 WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents(世界线:面向长时域具身智能体的基准测试与建模) [08:16] 🤖 GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning(通用视觉-语言-动作系统2:几何感知重建与受控记忆用于机器人规划) [09:07] 🧑 SpatialAvatar-0: High-Quality 4D Head Avatar with Multi-Stage Reconstruction(SpatialAvatar-0:基于多阶段重建的高质量4D头部虚拟化身) [10:15] 💬 Distilling Examples into Task Instructions: Enhanced In-Context Learning for Real-World B2B Conversations(将示例蒸馏为任务指令:面向真实世界B2B对话的增强型上下文学习) [11:20] 👁 StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs(风格化偏见:少数人类视觉线索驱动多模态大语言模型中的大多数社会偏见) [12:08] 📖 Characterizing Narrative Content in Web-scale LLM Pretraining Data(网络规模大语言模型预训练数据中的叙事内容特征化) [13:09] 📊 When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning(何时、何地以及如何:面向表格自监督学习的自适应分箱方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
【周末特辑】6月第4周最火AI论文 | 循环世界模型巧解两难困境;JoyAI-VL实现主动实时交互【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 5 篇论文如下: [00:47] TOP1(🔥204) | 🔄 Looped World Models(循环世界模型) [02:56] TOP2(🔥192) | 👁 JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence(JoyAI-VL-Interaction:实时视觉-语言交互智能) [05:16] TOP3(🔥139) | 🔄 LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling(LoopCoder-v2:仅循环一次以实现高效的测试时计算扩展) [07:22] TOP4(🔥118) | 📊 Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories(数据记者智能体:将数据转化为可验证的多模态故事) [09:36] TOP5(🔥108) | 🤖 Geometric Action Model for Robot Policy Learning(几何动作模型:用于机器人策略学习的几何基础模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.06.19 | RATs让机器人自主玩耍学技能;Moebius用0.2B参数实现10B级修复性能【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 15 篇论文如下: [00:33] 🤖 Playful Agentic Robot Learning(趣味自主型机器人学习) [01:22] 🎨 Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance(Moebius:0.2B轻量级图像修复框架,实现10B级性能) [02:10] 🧠 S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence(S-Agent:空间工具使用激发空间智能推理) [03:10] 📊 Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents(超越静态排行榜:LLM智能体评估的预测有效性) [04:05] 🎨 FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining(FreeStyle:基于社区LoRA挖掘的自由风格-内容双参考生成控制) [05:06] 🪄 JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising(JanusMesh:通过跨空间去噪实现快速且零样本的3D视觉错觉生成) [05:58] 🤖 ENPIRE: Agentic Robot Policy Self-Improvement in the Real World(ENPIRE:实体世界中智能体机器人策略的自我改进) [06:57] 👁 Thinking with Visual Grounding(视觉锚定思考) [07:41] 🔍 Understanding the Behaviors of Environment-aware Information Retrieval(理解环境感知信息检索的行为) [08:37] 🤖 FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines(FAPO:多步骤大语言模型管道的全自主提示优化) [09:28] 🧊 Adaptive Volumetric Mechanical Property Fields Invariant to Resolution(自适应体积力学属性场,分辨率不变性) [10:23] 📸 DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis(DF3DV-1K:用于无干扰新视角合成的大规模数据集与基准) [11:16] 🌍 Holo-World: Unified Camera, Object and Weather Control for Video World Model(全息世界:面向视频世界模型的统一相机、物体与天气控制) [12:12] 🎨 ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?(ImageWAM:世界动作模型真的需要视频生成,还是只需图像编辑?) [13:07] 🎯 Selective Synergistic Learning for Video Object-Centric Learning(面向视频对象中心学习的选择性协同学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.06.18 | 多模态大模型记忆成短板;语言指令驱动3D轨迹预测【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】 本期的 15 篇论文如下: [00:32] 🧠 Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games(超越当前观测:评估多模态大语言模型在可控非马尔可夫博弈中的表现) [01:29] 🎯 MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction(MolmoMotion:基于语言指令的3D点轨迹预测) [02:15] 🌍 Kairos: A Native World Model Stack for Physical AI(Kairos:面向物理智能的原生世界模型栈) [03:05] 🛠 Guava: An Effective and Universal Harness for Embodied Manipulation(番石榴:一种有效且通用的具身操作框架) [03:58] ⚡ EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts(高效展开:面向强化学习展开的系统感知自推测解码) [04:45] 🎯 The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL(奖励一直就在你的数据中:用判别器引导的强化学习纠正流匹配) [05:51] 🔍 SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior(SAE干预不可靠:抑制行为在干预后的恢复) [06:36] 🤖 From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning(从受训者到训练者:面向多智能体推理的LLM设计训练环境强化学习框架) [07:27] 🧠 Reinforcing Dual-Path Reasoning in Spatial Vision Language Models(增强空间视觉语言模型中的双路径推理) [08:25] 🎯 Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding(信任正确的教师:面向GUI定位的质量感知自蒸馏方法) [09:25] 👁 Native Active Perception as Reasoning for Omni-Modal Understanding(原生主动感知作为全模态理解的推理) [10:15] 🐱 MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model(缅因猫:追求实时的音视频社交世界模型) [11:08] 🖌 Sumi: Open Uniform Diffusion Language Model from Scratch(Sumi:从头构建的开放均匀扩散语言模型) [11:51] 🎯 STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability(STARE:基于惊异度的令牌级优势重加权以实现策略熵稳定性) [12:46] 🌍 Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems(超越对齐:价值多样性作为多元文化智能体系统中的集体属性) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递