本期的 15 篇论文如下:
00:24 🎥 VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation(视频思维生成:多镜头视频生成的协作框架)
01:04 🧠 Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability(关键令牌重要性:令牌级对比估计提升LLM的推理能力)
01:45 🔄 Free Process Rewards without Process Labels(无过程标签的自由过程奖励)
02:30 🎧 AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?(AV-Odyssey 基准:多模态大语言模型真的能理解视听信息吗?)
03:04 🤖 MALT: Improving Reasoning with Multi-Agent LLM Training(MALT:通过多智能体LLM训练提升推理能力)
03:45 🎥 OmniCreator: Self-Supervised Unified Generation with Universal Editing(全能创作者:自监督统一生成与通用编辑)
04:23 🌴 Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis(真相还是幻象?面向端到端事实性评估的LLM-Oasis)
05:08 📚 OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation(OCR 阻碍 RAG:评估 OCR 对检索增强生成系统的级联影响)
05:51 📊 Scaling Image Tokenizers with Grouped Spherical Quantization(基于分组球面量化的图像标记器扩展)
06:27 🌐 LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences(LSceneLLM:利用自适应视觉偏好增强大型3D场景理解)
07:09 ⚙ A dynamic parallel method for performance optimization on hybrid CPUs(混合CPU性能优化的动态并行方法)
08:00 🌐 MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation(MaskRIS:语义扭曲感知的数据增强方法用于指称图像分割)
08:46 🎥 Motion Prompting: Controlling Video Generation with Motion Trajectories(运动提示:通过运动轨迹控制视频生成)
09:27 🎥 VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval(视频亮点:联合视频亮点检测与时刻检索的特征精炼与跨任务对齐Transformer)
10:01 🤖 Generating a Low-code Complete Workflow via Task Decomposition and RAG(通过任务分解和RAG生成低代码完整工作流程)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递