2025.03.20 | 自适应前瞻采样优化推理;强化学习提升3D网格质量

2025.03.20 | 自适应前瞻采样优化推理;强化学习提升3D网格质量

11分钟 ·
播放数91
·
评论数0

本期的 15 篇论文如下:

00:23 🔍 $φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation($\phi$-解码:用于平衡推理时探索与利用的自适应前瞻采样)

01:08 🎨 DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning(DeepMesh:基于强化学习的自回归艺术家网格创建)

01:51 🌷 TULIP: Towards Unified Language-Image Pretraining(TULIP:迈向统一的语言-图像预训练)

02:26 🤖 Cube: A Roblox View of 3D Intelligence(Cube:Roblox 视角下的 3D 智能)

03:06 📱 Efficient Personalization of Quantized Diffusion Model without Backpropagation(无需反向传播的量化扩散模型高效个性化)

03:48 🎬 Temporal Regularization Makes Your Video Generator Stronger(时间正则化使你的视频生成器更强大)

04:21 🤖 STEVE: AStep Verification Pipeline for Computer-use Agent Training(STEVE:用于计算机使用代理训练的步骤验证管道)

04:59 🖼 LEGION: Learning to Ground and Explain for Synthetic Image Detection(LEGION:学习定位与解释以用于合成图像检测)

05:41 🎶 MusicInfuser: Making Video Diffusion Listen and Dance(MusicInfuser:让视频扩散模型聆听与舞动)

06:24 👋 ViSpeak: Visual Instruction Feedback in Streaming Videos(ViSpeak:流视频中的视觉指令反馈)

07:03 🧠 GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction(GKG-LLM:一个用于广义知识图谱构建的统一框架)

07:46 👁 Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning(通过随身携带的视觉条件反射缓解多模态长链思维推理中的视觉遗忘)

08:32 🗣 Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait(解锁姿态多样性:用于音频驱动的说话人像的精确高效的基于隐式关键点的时空扩散)

09:09 🤖 ELTEX: A Framework for Domain-Driven Synthetic Data Generation(ELTEX:一种领域驱动的合成数据生成框架)

09:52 🧪 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning(CURIE:评估大型语言模型在多任务科学长文本理解与推理方面的能力)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递