本期的 15 篇论文如下:
00:21 🎥 DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation(DropletVideo:探索整体时空一致性视频生成的数据集与方法)
01:10 🤖 Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills(Being-0:一个具有视觉-语言模型和模块化技能的人形机器人代理)
01:49 🖼 DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models(DreamRenderer:驯服大规模文本到图像模型中的多实例属性控制)
02:38 🖼 Edit Transfer: Learning Image Editing via Vision In-Context Relations(编辑迁移:通过视觉上下文关系学习图像编辑)
03:12 🖼 Personalize Anything for Free with Diffusion Transformer(使用扩散Transformer免费实现任何物体的个性化)
03:53 🎬 WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes(WideRange4D:通过宽范围运动和场景实现高质量4D重建)
04:30 🎨 BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing(BlobCtrl: 用于元素级图像生成与编辑的统一且灵活的框架)
05:14 🛡 reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs(reWordBench:通过转换输入来评估和提升奖励模型的鲁棒性)
05:54 🔬 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research(MicroVQA:一个用于基于显微镜的科学研究的多模态推理基准)
06:31 🧠 Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey(多模态思维链推理:一项综合综述)
07:09 🤖 Free-form language-based robotic reasoning and grasping(基于自由形式语言的机器人推理与抓取)
07:45 🧠 R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization(R1-VL:通过逐步分组相对策略优化学习多模态大型语言模型的推理)
08:35 🤔 V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning(V-STaR:视频时空推理能力评测基准)
09:18 🎬 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning(VideoMind:用于长视频推理的链式LoRA Agent)
09:51 🖼 Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation(奖励足以实现快速逼真的文本到图像生成)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递