2025.03.17 | 新相机轨迹生成，稀疏性提升图像质量 - HuggingFace 每日AI论文速递

本期的 15 篇论文如下：

00:25 🎥 ReCamMaster: Camera-Controlled Generative Rendering from A Single Video（ReCamMaster：基于单视频的相机控制生成式渲染）

01:11 💡 PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity（PLADIS：通过利用稀疏性，在扩散模型推理时突破注意力机制的限制）

01:50 🤖 Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning（对抗性数据收集：用于高效和鲁棒机器人模仿学习的人机协作扰动）

02:38 📊 Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models（关于有效性和效率的技术：状态空间模型综述）

03:25 🤖 API Agents vs. GUI Agents: Divergence and Convergence（API智能体与GUI智能体：差异与融合）

03:57 🛡 Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks（联邦学习的脆弱性探索：梯度反演攻击深度解析）

04:47 🎬 Large-scale Pre-training for Grounded Video Caption Generation（面向视频内容理解的大规模预训练）

05:31 🌉 FlowTok: Flowing Seamlessly Across Text and Image Tokens（FlowTok：在文本和图像Token之间无缝流动）

06:08 ⚕ TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools（TxAgent：一个用于跨工具领域进行治疗推理的AI Agent）

06:47 🤔 Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?（Kolmogorov-Arnold注意力机制：可学习的注意力机制更适合视觉Transformer吗？）

07:27 📸 VGGT: Visual Geometry Grounded Transformer（VGGT：基于视觉几何的Transformer）

08:14 🦜 Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption（Cockatiel：集成合成数据与人类偏好训练，实现细致的视频描述）

08:52 🖼 Neighboring Autoregressive Modeling for Efficient Visual Generation（相邻自回归建模：用于高效视觉生成）

09:26 🔬 ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges（ProJudge：一个基于多模态大语言模型的过程评估器的多模态多学科基准和指令微调数据集）

10:02 🖼 ARMOR v0.1: Empowering Autoregressive Multimodal Understanding Model with Interleaved Multimodal Generation via Asymmetric Synergy（ARMOR v0.1：通过非对称协同的交错多模态生成增强自回归多模态理解模型）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递