本期的 14 篇论文如下:
00:23 🎥 SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints(SynCamMaster:同步多视角视频生成)
01:07 🌐 LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations(LAION-SG:用于训练复杂图像-文本模型的增强型大规模数据集与结构化注释)
01:51 🌐 POINTS1.5: Building a Vision-Language Model towards Real World Applications(POINTS1.5:构建面向实际应用的视觉语言模型)
02:28 🎨 Learning Flow Fields in Attention for Controllable Person Image Generation(在注意力中学习流场用于可控人物图像生成)
03:11 🎥 StyleMaster: Stylize Your Video with Artistic Generation and Translation(风格大师:艺术生成与转换的视频风格化)
04:00 🔍 Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction(生成密集化:学习在高保真泛化三维重建中密集化高斯分布)
04:46 🎥 StreamChat: Chatting with Streaming Video(流媒体聊天:与流媒体视频互动)
05:28 🧠 3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark(3DSRBench:一个综合的3D空间推理基准)
06:12 🏃 Mogo: RQ Hierarchical Causal Transformer for High-Quality 3D Human Motion Generation(Mogo:用于高质量3D人体运动生成的RQ分层因果Transformer)
07:01 🧠 KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models(KaSA:知识感知奇异值适应大型语言模型)
07:40 🖼 FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models(FlowEdit:基于预训练流模型的无逆向文本编辑)
08:17 🎨 StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements(StyleStudio:基于文本的风格迁移与风格元素选择性控制)
09:03 🌍 MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation(MIT-10M:大规模多语言图像翻译并行语料库)
09:50 🚀 Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel(自引导数据飞轮的语言引导导航学习)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递