本期的 15 篇论文如下:
00:21 🦢 RWKV-7 "Goose" with Expressive Dynamic State Evolution(RWKV-7 "Goose":具有表达性动态状态演化的序列建模)
00:55 🤯 Impossible Videos(不可能的视频)
01:38 🎨 Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM(Creation-MMBench:评估多模态大型语言模型中具有上下文感知能力的创造性智能)
02:17 🤖 DAPO: An Open-Source LLM Reinforcement Learning System at Scale(DAPO:一个大规模的开源LLM强化学习系统)
02:58 🧠 DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding(DeepPerception:提升多模态大型语言模型中类R1认知视觉感知能力,用于知识密集型视觉定位)
03:39 🖼 CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era(CapArena:LLM时代下详细图像描述的基准测试与分析)
04:25 🤖 Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation(无限可动性:通过程序生成实现可伸缩的高保真铰接物体合成)
05:13 🧠 Frac-Connections: Fractional Extension of Hyper-Connections(Frac-Connections:超连接的分数扩展)
05:52 🌍 Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control(宇宙-迁移1:基于自适应多模态控制的条件世界生成)
06:30 🧐 MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification(MPBench:用于过程错误识别的综合多模态推理基准)
07:13 🤖 Aligning Multimodal LLM with Human Preference: A Survey(多模态大语言模型与人类偏好对齐:一项综述)
07:51 ⏱ Measuring AI Ability to Complete Long Tasks(衡量人工智能完成长时任务的能力)
08:38 🎭 Concat-ID: Towards Universal Identity-Preserving Video Synthesis(Concat-ID:面向通用身份保持的视频合成)
09:13 🖼 FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis(FlexWorld: 用于灵活视角合成的渐进式扩展3D场景)
09:50 🤔 Temporal Consistency for LLM Reasoning Process Error Identification(LLM推理过程错误识别的时序一致性方法)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递