2025.03.18 | 视频生成新方法,人形机器人新框架

2025.03.18 | 视频生成新方法,人形机器人新框架

11分钟 ·
播放数167
·
评论数0

本期的 15 篇论文如下:

00:21 🎥 DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation(DropletVideo:探索整体时空一致性视频生成的数据集与方法)

01:10 🤖 Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills(Being-0:一个具有视觉-语言模型和模块化技能的人形机器人代理)

01:49 🖼 DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models(DreamRenderer:驯服大规模文本到图像模型中的多实例属性控制)

02:38 🖼 Edit Transfer: Learning Image Editing via Vision In-Context Relations(编辑迁移:通过视觉上下文关系学习图像编辑)

03:12 🖼 Personalize Anything for Free with Diffusion Transformer(使用扩散Transformer免费实现任何物体的个性化)

03:53 🎬 WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes(WideRange4D:通过宽范围运动和场景实现高质量4D重建)

04:30 🎨 BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing(BlobCtrl: 用于元素级图像生成与编辑的统一且灵活的框架)

05:14 🛡 reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs(reWordBench:通过转换输入来评估和提升奖励模型的鲁棒性)

05:54 🔬 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research(MicroVQA:一个用于基于显微镜的科学研究的多模态推理基准)

06:31 🧠 Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey(多模态思维链推理:一项综合综述)

07:09 🤖 Free-form language-based robotic reasoning and grasping(基于自由形式语言的机器人推理与抓取)

07:45 🧠 R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization(R1-VL:通过逐步分组相对策略优化学习多模态大型语言模型的推理)

08:35 🤔 V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning(V-STaR:视频时空推理能力评测基准)

09:18 🎬 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning(VideoMind:用于长视频推理的链式LoRA Agent)

09:51 🖼 Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation(奖励足以实现快速逼真的文本到图像生成)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递