本期的 23 篇论文如下:
00:23 🔍 VisionZip: Longer is Better but Not Necessary in Vision Language Models(视觉压缩:视觉语言模型中长度并非必要优势)
01:03 🤖 Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection(代码即监控:约束感知的视觉编程用于反应性和前瞻性机器人故障检测)
01:43 🖥 Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction(Aguvis:统一纯视觉自主GUI交互代理)
02:27 🔊 A Noise is Worth Diffusion Guidance(噪声值得扩散引导)
03:04 📊 Evaluating Language Models as Synthetic Data Generators(评估语言模型作为合成数据生成器)
03:48 🌐 Structured 3D Latents for Scalable and Versatile 3D Generation(结构化3D潜在表示在可扩展和多功能3D生成中的应用)
04:26 🌐 MV-Adapter: Multi-view Consistent Image Generation Made Easy(MV-Adapter:多视角一致图像生成变得简单)
05:05 🖼 Negative Token Merging: Image-based Adversarial Feature Guidance(负向标记合并:基于图像的对抗特征引导)
05:41 🌐 Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion(佛罗伦萨-VL:通过生成视觉编码器和深度-广度融合增强视觉语言模型)
06:18 📈 Densing Law of LLMs(大语言模型的密度定律)
06:59 🌌 Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis(无限:高分辨率图像合成中的比特位自回归建模)
07:37 ⚽ Towards Universal Soccer Video Understanding(面向通用足球视频理解)
08:15 🎨 HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing(HumanEdit:一个高质量的人类奖励数据集,用于基于指令的图像编辑)
08:53 👗 AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models(任意服装虚拟试穿:基于潜在扩散模型的可定制多服装生成)
09:35 🌍 Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation(全球MMLU:理解和解决多语言评估中的文化和语言偏见)
10:11 🌐 Personalized Multimodal Large Language Models: A Survey(个性化多模态大语言模型:综述)
10:55 ⚡ ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality(ZipAR:通过空间局部性加速自回归图像生成)
11:36 🧠 MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities(MRGen:基于扩散的可控数据引擎用于无标注模态的MRI分割)
12:14 🧠 Discriminative Fine-tuning of LVLMs(判别性微调的大视觉语言模型)
12:48 🧠 Monet: Mixture of Monosemantic Experts for Transformers(Monet:Transformer的单语义专家混合模型)
13:24 🌊 OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows(全流:多模态校正流的任意到任意生成)
13:59 🧠 KV Shifting Attention Enhances Language Modeling(KV移位注意力增强语言建模)
14:40 🌍 Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement(Marco-LLM:通过大规模多语言训练实现跨语言增强)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递