2025.03.28 | 视频推理提升,GUI动作预测优化

2025.03.28 | 视频推理提升,GUI动作预测优化

11分钟 ·
播放数87
·
评论数0

本期的 15 篇论文如下:

00:22 🧠 Video-R1: Reinforcing Video Reasoning in MLLMs(Video-R1:增强多模态大语言模型中的视频推理)

01:02 📱 UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning(UI-R1:通过强化学习增强GUI代理的动作预测)

01:41 🤯 Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models(挑战推理的边界:一个面向大型语言模型设计的奥林匹克级别数学基准)

02:25 🎬 VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness(VBench-2.0: 提升视频生成基准套件的内在真实性)

03:05 🖼 LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis(LeX-Art:通过可扩展的高质量数据合成重新思考文本生成)

03:38 🤖 Large Language Model Agent: A Survey on Methodology, Applications and Challenges(大型语言模型智能体:方法论、应用与挑战综述)

04:23 🧠 ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation(ReaRAG:知识引导的推理增强大型推理模型的事实性,通过迭代检索增强生成)

05:01 🖼 Lumina-Image 2.0: A Unified and Efficient Image Generative Framework(Lumina-Image 2.0:一个统一且高效的图像生成框架)

05:48 🤖 Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks(具身推理器:协同视觉搜索、推理和行动以完成具身交互任务)

06:27 💡 ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition(ResearchBench:基于灵感驱动的任务分解评估大语言模型在科学发现中的能力)

07:12 🚀 Optimal Stepsize for Diffusion Sampling(扩散采样的最优步长)

07:46 🤔 Exploring the Evolution of Physics Cognition in Video Generation: A Survey(视频生成中物理认知进化探索:一项综述)

08:24 🎤 FinAudio: A Benchmark for Audio Large Language Models in Financial Applications(FinAudio:金融应用中音频大语言模型的基准测试)

09:01 🗣 ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model(ChatAnyone:基于分层运动扩散模型的风格化实时人像视频生成)

09:40 🧠 ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging(ZJUKLAB团队在SemEval-2025 Task 4:通过模型融合实现知识遗忘)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递