本期的 15 篇论文如下:
00:22 🧠 MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving(MAPS:一个基于大七人格和苏格拉底指导的多智能体框架,用于多模态科学问题求解)
01:09 🤖 MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization(MARS:一个融合苏格拉底式指导的多智能体自动提示优化框架)
01:55 🤖 RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints(RoboFactory:探索具有组合约束的具身智能体协作)
02:38 🧮 When Less is Enough: Adaptive Token Reduction for Efficient Image Representation(适可而止:用于高效图像表征的自适应Token缩减)
03:21 🌉 Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation(用于自回归视觉生成的连续和离散令牌桥接)
03:55 🧠 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement(OpenVLThinker:通过迭代自提升进行复杂视觉语言推理的早期探索)
04:37 ✍ Modifying Large Language Model Post-Training for Diverse Creative Writing(修改大型语言模型后训练以实现多样化的创意写作)
05:21 🧮 MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems(MathFlow: 提升 MLLM 在视觉数学问题中的感知流程)
06:05 🎬 Enabling Versatile Controls for Video Diffusion Models(实现视频扩散模型的多功能控制)
06:48 🎬 ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering(ETVA:通过细粒度问题生成与回答评估文本到视频的对齐)
07:27 🖼 Single Image Iterative Subject-driven Generation and Editing(单图像迭代式主体驱动生成与编辑)
08:12 🎨 When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO(当偏好出现分歧:通过少数群体感知自适应DPO对齐扩散模型)
08:56 ⚖ From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration(从头到尾:通过自适应数据校准实现大型视觉-语言模型中的平衡表征)
09:37 🚀 FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models(FastCuRL:基于渐进式上下文扩展的课程强化学习,用于高效训练类R1推理模型)
10:13 🗣 TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting(TaoAvatar:基于3D高斯溅射的增强现实中实时逼真的全身对话化身)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递