2024.12.06 每日AI论文 | 视觉压缩提升效率，代码监控增强机器人可靠性。 - HuggingFace 每日AI论文速递

本期的 23 篇论文如下：

00:23 🔍 VisionZip: Longer is Better but Not Necessary in Vision Language Models（视觉压缩：视觉语言模型中长度并非必要优势）

01:03 🤖 Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection（代码即监控：约束感知的视觉编程用于反应性和前瞻性机器人故障检测）

01:43 🖥 Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction（Aguvis：统一纯视觉自主GUI交互代理）

02:27 🔊 A Noise is Worth Diffusion Guidance（噪声值得扩散引导）

03:04 📊 Evaluating Language Models as Synthetic Data Generators（评估语言模型作为合成数据生成器）

03:48 🌐 Structured 3D Latents for Scalable and Versatile 3D Generation（结构化3D潜在表示在可扩展和多功能3D生成中的应用）

04:26 🌐 MV-Adapter: Multi-view Consistent Image Generation Made Easy（MV-Adapter：多视角一致图像生成变得简单）

05:05 🖼 Negative Token Merging: Image-based Adversarial Feature Guidance（负向标记合并：基于图像的对抗特征引导）

05:41 🌐 Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion（佛罗伦萨-VL：通过生成视觉编码器和深度-广度融合增强视觉语言模型）

06:18 📈 Densing Law of LLMs（大语言模型的密度定律）

06:59 🌌 Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis（无限：高分辨率图像合成中的比特位自回归建模）

07:37 ⚽ Towards Universal Soccer Video Understanding（面向通用足球视频理解）

08:15 🎨 HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing（HumanEdit：一个高质量的人类奖励数据集，用于基于指令的图像编辑）

08:53 👗 AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models（任意服装虚拟试穿：基于潜在扩散模型的可定制多服装生成）

09:35 🌍 Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation（全球MMLU：理解和解决多语言评估中的文化和语言偏见）

10:11 🌐 Personalized Multimodal Large Language Models: A Survey（个性化多模态大语言模型：综述）

10:55 ⚡ ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality（ZipAR：通过空间局部性加速自回归图像生成）

11:36 🧠 MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities（MRGen：基于扩散的可控数据引擎用于无标注模态的MRI分割）

12:14 🧠 Discriminative Fine-tuning of LVLMs（判别性微调的大视觉语言模型）

12:48 🧠 Monet: Mixture of Monosemantic Experts for Transformers（Monet：Transformer的单语义专家混合模型）

13:24 🌊 OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows（全流：多模态校正流的任意到任意生成）

13:59 🧠 KV Shifting Attention Enhances Language Modeling（KV移位注意力增强语言建模）

14:40 🌍 Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement（Marco-LLM：通过大规模多语言训练实现跨语言增强）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递