2024.12.13 每日AI论文 | 多模态系统提升长期交互，phi-4优化STEM问答表现。 - HuggingFace 每日AI论文速递

本期的 23 篇论文如下：

00:23 🎥 InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions（InternLM-XComposer2.5-OmniLive：一个用于长期流式视频和音频交互的综合多模态系统）

01:03 🧠 Phi-4 Technical Report（Phi-4 技术报告）

01:43 🧠 Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions（欧几里得：通过合成高保真视觉描述提升多模态大语言模型）

02:27 🌐 Multimodal Latent Language Modeling with Next-Token Diffusion（多模态潜在语言建模与下一词扩散）

03:10 🌐 EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM（EasyRef：基于多模态大语言模型的扩散模型通用化图像参考）

03:57 🌐 AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials（AgentTrek：通过网络教程引导回放的代理轨迹合成）

04:43 🌟 Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion（神经光装置：利用多光源扩散解锁精确物体法线和材质估计）

05:24 📱 SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training（SnapGen：通过高效架构和训练驯服高分辨率文本到图像模型以适应移动设备）

06:02 🔬 PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations（PIG：物理信息高斯函数作为自适应参数化网格表示）

06:49 📊 Learned Compression for Compressed Learning（压缩学习中的学习压缩）

07:32 🎙 Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition（Lyra：一个高效且以语音为中心的全认知框架）

08:20 📊 RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios（RuleArena：在现实场景中评估LLMs规则引导推理能力的基准）

09:08 👀 Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders（Gaze-LLE：通过大规模学习编码器进行注视目标估计）

10:02 🧠 JuStRank: Benchmarking LLM Judges for System Ranking（JuStRank：基准测试用于系统排名的LLM评判器）

10:43 🧠 OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation（OLA-VLM：通过辅助嵌入蒸馏提升多模态大语言模型的视觉感知能力）

11:34 📚 The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective（版权材料对大型语言模型的影响：挪威视角）

12:16 🔗 Word Sense Linking: Disambiguating Outside the Sandbox（词义链接：超越沙盒的消歧）

12:58 🌐 FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction（FreeSplatter：无姿态高斯喷射用于稀疏视图三维重建）

13:42 🎥 DisPose: Disentangling Pose Guidance for Controllable Human Image Animation（DisPose：解耦姿态引导的可控人体图像动画）

14:26 🖼 LoRACLR: Contrastive Adaptation for Customization of Diffusion Models（LoRACLR：对比适应用于扩散模型的定制化）

15:21 🧭 SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts（SAME：学习基于状态自适应混合专家的通用语言引导视觉导航）

16:05 🌟 Arbitrary-steps Image Super-resolution via Diffusion Inversion（基于扩散反演的任意步图像超分辨率）

16:46 📚 Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages（Shiksha：面向印度语言的技术领域翻译数据集与模型）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递