本期的 15 篇论文如下:
00:22 🎬 Long-Context Autoregressive Video Modeling with Next-Frame Prediction(基于下一帧预测的长程上下文自回归视频建模)
01:01 🖼 CoMP: Continual Multimodal Pre-training for Vision Foundation Models(CoMP:面向视觉基础模型的持续多模态预训练)
01:42 🎬 Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation(探索大型多模态模型在视频理解中的幻觉现象:基准、分析与缓解)
02:28 📈 Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing(基于随机生成与回滚预算强制的Flow模型推理时扩展)
03:14 🖼 Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation(揪出伪造:基于大型多模态模型的合成图像检测与伪影解释)
03:54 🖼 Scaling Vision Pre-Training to 4K Resolution(将视觉预训练扩展到4K分辨率)
04:33 🤔 Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking(三思而后行:通过扩展多轮测试时思考来增强LLM推理能力)
05:15 🖼 CoLLM: A Large Language Model for Composed Image Retrieval(CoLLM:用于组合图像检索的大型语言模型)
05:53 🤖 MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding(MDocAgent:用于文档理解的多模态多代理框架)
06:35 🖼 Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models(基于扩散模型的潜在空间超分辨率高分辨率图像生成)
07:13 🔍 ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning(ReSearch:通过强化学习训练大型语言模型以进行搜索推理)
07:54 🛡 LookAhead Tuning: Safer Language Models via Partial Answer Previews(前瞻调优:通过部分答案预览实现更安全的语言模型)
08:38 💡 Frequency Dynamic Convolution for Dense Image Prediction(用于密集图像预测的频率动态卷积)
09:18 🖼 LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation(LPOSS:基于图像块和像素的标签传播,用于开放词汇语义分割)
09:51 🧬 Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation(基于直通引导的Gumbel-Softmax Flow Matching用于可控生物序列生成)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递