NLP Arxiv Daily

NLP Arxiv DailyDaily-refreshed NLP arxiv paper digesthttps://monologg.kr/nlp-arxiv-daily/The Scientific Contribution Graph: Automated Literature-based Technological Roadmapping at Scalehttp://arxiv.org/abs/2605.15011v1http://arxiv.org/abs/2605.15011v1Peter A. Jansen et al. — arxiv:2605.15011 — NLPThu, 14 May 2026 00:00:00 GMTNLPConversion of Lexicon-Grammar tables to LMF. Application to Frenchhttp://arxiv.org/abs/2605.14816v1http://arxiv.org/abs/2605.14816v1Eric Laporte et al. — arxiv:2605.14816 — NLPThu, 14 May 2026 00:00:00 GMTNLPGraphs of Research: Citation Evolution Graphs as Supervision for Research Idea Generationhttp://arxiv.org/abs/2605.14790v1http://arxiv.org/abs/2605.14790v1Songyang Gao et al. — arxiv:2605.14790 — NLPThu, 14 May 2026 00:00:00 GMTNLPAre Candidate Models Really Needed for Active Learning?http://arxiv.org/abs/2605.14689v1http://arxiv.org/abs/2605.14689v1Harshini Mridula Mohan et al. — arxiv:2605.14689 — NLPThu, 14 May 2026 00:00:00 GMTNLPSciPaths: Forecasting Pathways to Scientific Discoveryhttp://arxiv.org/abs/2605.14600v1http://arxiv.org/abs/2605.14600v1Eric Chamoun et al. — arxiv:2605.14600 — NLPThu, 14 May 2026 00:00:00 GMTNLPA Formative Study of Brief Affective Text as a Complement to Wearable Sensing for Longitudinal Student Health Monitoringhttp://arxiv.org/abs/2605.14360v1http://arxiv.org/abs/2605.14360v1Tamunotonye Harry et al. — arxiv:2605.14360 — NLPThu, 14 May 2026 00:00:00 GMTNLPMetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unificationhttp://arxiv.org/abs/2605.14289v1http://arxiv.org/abs/2605.14289v1Weisen Jiang et al. — arxiv:2605.14289 — NLPThu, 14 May 2026 00:00:00 GMTNLPWhat Makes Words Hard? Sakura at BEA 2026 Shared Task on Vocabulary Difficulty Predictionhttp://arxiv.org/abs/2605.14257v1http://arxiv.org/abs/2605.14257v1Adam Nohejl et al. — arxiv:2605.14257 — NLPThu, 14 May 2026 00:00:00 GMTNLPArticraft: An Agentic System for Scalable Articulated 3D Asset Generationhttp://arxiv.org/abs/2605.15187v1http://arxiv.org/abs/2605.15187v1Matt Zhou et al. — arxiv:2605.15187 — LLMThu, 14 May 2026 00:00:00 GMTLLMIs Grep All You Need? How Agent Harnesses Reshape Agentic Searchhttp://arxiv.org/abs/2605.15184v1http://arxiv.org/abs/2605.15184v1Sahil Sen et al. — arxiv:2605.15184 — LLMThu, 14 May 2026 00:00:00 GMTLLMOpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregationhttp://arxiv.org/abs/2605.15177v1http://arxiv.org/abs/2605.15177v1Shang Zhou et al. — arxiv:2605.15177 — LLMThu, 14 May 2026 00:00:00 GMTLLMMetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMshttp://arxiv.org/abs/2605.15172v1http://arxiv.org/abs/2605.15172v1Rui Wen et al. — arxiv:2605.15172 — LLMThu, 14 May 2026 00:00:00 GMTLLMText Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignmenthttp://arxiv.org/abs/2605.15168v1http://arxiv.org/abs/2605.15168v1Sayantan Kumar et al. — arxiv:2605.15168 — LLMThu, 14 May 2026 00:00:00 GMTLLMMeMo: Memory as a Modelhttp://arxiv.org/abs/2605.15156v1http://arxiv.org/abs/2605.15156v1Ryan Wei Heng Quek et al. — arxiv:2605.15156 — LLMThu, 14 May 2026 00:00:00 GMTLLMSelf-Distilled Agentic Reinforcement Learninghttp://arxiv.org/abs/2605.15155v1http://arxiv.org/abs/2605.15155v1Zhengxi Lu et al. — arxiv:2605.15155 — LLMThu, 14 May 2026 00:00:00 GMTLLMWidening the Gap: Exploiting LLM Quantization via Outlier Injectionhttp://arxiv.org/abs/2605.15152v1http://arxiv.org/abs/2605.15152v1Xiaohua Zhan et al. — arxiv:2605.15152 — LLMThu, 14 May 2026 00:00:00 GMTLLMAPWA: A Distributed Architecture for Parallelizable Agentic Workflowshttp://arxiv.org/abs/2605.15132v1http://arxiv.org/abs/2605.15132v1Evan Rose et al. — arxiv:2605.15132 — LLMThu, 14 May 2026 00:00:00 GMTLLMTalk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attackshttp://arxiv.org/abs/2605.15118v1http://arxiv.org/abs/2605.15118v1Karthik Raghu Iyer et al. — arxiv:2605.15118 — LLMThu, 14 May 2026 00:00:00 GMTLLMATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Bothhttp://arxiv.org/abs/2605.15198v1http://arxiv.org/abs/2605.15198v1Ziyu Guo et al. — arxiv:2605.15198 — LLM AgentThu, 14 May 2026 00:00:00 GMTLLM AgentFutureSim: Replaying World Events to Evaluate Adaptive Agentshttp://arxiv.org/abs/2605.15188v1http://arxiv.org/abs/2605.15188v1Shashwat Goel et al. — arxiv:2605.15188 — LLM AgentThu, 14 May 2026 00:00:00 GMTLLM AgentArticraft: An Agentic System for Scalable Articulated 3D Asset Generationhttp://arxiv.org/abs/2605.15187v1http://arxiv.org/abs/2605.15187v1Matt Zhou et al. — arxiv:2605.15187 — LLM AgentThu, 14 May 2026 00:00:00 GMTLLM AgentIs Grep All You Need? How Agent Harnesses Reshape Agentic Searchhttp://arxiv.org/abs/2605.15184v1http://arxiv.org/abs/2605.15184v1Sahil Sen et al. — arxiv:2605.15184 — LLM AgentThu, 14 May 2026 00:00:00 GMTLLM AgentFrom Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editinghttp://arxiv.org/abs/2605.15181v1http://arxiv.org/abs/2605.15181v1Anirudh Sundara Rajan et al. — arxiv:2605.15181 — LLM AgentThu, 14 May 2026 00:00:00 GMTLLM AgentPosition: Behavioural Assurance Cannot Verify the Safety Claims Governance Now Demandshttp://arxiv.org/abs/2605.15164v1http://arxiv.org/abs/2605.15164v1Pratinav Seth et al. — arxiv:2605.15164 — LLM AgentThu, 14 May 2026 00:00:00 GMTLLM AgentSelf-Distilled Agentic Reinforcement Learninghttp://arxiv.org/abs/2605.15155v1http://arxiv.org/abs/2605.15155v1Zhengxi Lu et al. — arxiv:2605.15155 — LLM AgentThu, 14 May 2026 00:00:00 GMTLLM AgentGuises and Perspectives: An Intentional and Hyperintensional Sketchhttp://arxiv.org/abs/2605.15144v1http://arxiv.org/abs/2605.15144v1Juan J. Colomina-Alminana et al. — arxiv:2605.15144 — LLM AgentThu, 14 May 2026 00:00:00 GMTLLM AgentAPWA: A Distributed Architecture for Parallelizable Agentic Workflowshttp://arxiv.org/abs/2605.15132v1http://arxiv.org/abs/2605.15132v1Evan Rose et al. — arxiv:2605.15132 — LLM AgentThu, 14 May 2026 00:00:00 GMTLLM AgentMemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memoryhttp://arxiv.org/abs/2605.15128v1http://arxiv.org/abs/2605.15128v1Minghao Guo et al. — arxiv:2605.15128 — LLM AgentThu, 14 May 2026 00:00:00 GMTLLM AgentFrom Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editinghttp://arxiv.org/abs/2605.15181v1http://arxiv.org/abs/2605.15181v1Anirudh Sundara Rajan et al. — arxiv:2605.15181 — Multi-AgentThu, 14 May 2026 00:00:00 GMTMulti-AgentSelf-Distilled Agentic Reinforcement Learninghttp://arxiv.org/abs/2605.15155v1http://arxiv.org/abs/2605.15155v1Zhengxi Lu et al. — arxiv:2605.15155 — Multi-AgentThu, 14 May 2026 00:00:00 GMTMulti-AgentAPWA: A Distributed Architecture for Parallelizable Agentic Workflowshttp://arxiv.org/abs/2605.15132v1http://arxiv.org/abs/2605.15132v1Evan Rose et al. — arxiv:2605.15132 — Multi-AgentThu, 14 May 2026 00:00:00 GMTMulti-AgentVeritas: A Semantically Grounded Agentic Framework for Memory Corruption Vulnerability Detection in Binarieshttp://arxiv.org/abs/2605.15097v1http://arxiv.org/abs/2605.15097v1Xinran Zheng et al. — arxiv:2605.15097 — Multi-AgentThu, 14 May 2026 00:00:00 GMTMulti-AgentOrchard: An Open-Source Agentic Modeling Frameworkhttp://arxiv.org/abs/2605.15040v1http://arxiv.org/abs/2605.15040v1Baolin Peng et al. — arxiv:2605.15040 — Multi-AgentThu, 14 May 2026 00:00:00 GMTMulti-AgentAI Knows When It's Being Watched: Functional Strategic Action and Contextual Register Modulation in Large Language Modelshttp://arxiv.org/abs/2605.15034v1http://arxiv.org/abs/2605.15034v1Vinicius Covas et al. — arxiv:2605.15034 — Multi-AgentThu, 14 May 2026 00:00:00 GMTMulti-AgentMulti-Agentic Approach for History Matching of Oil Reservoirshttp://arxiv.org/abs/2605.15028v1http://arxiv.org/abs/2605.15028v1Linar Samigullin et al. — arxiv:2605.15028 — Multi-AgentThu, 14 May 2026 00:00:00 GMTMulti-AgentCOTCAgent: Preventive Consultation via Probabilistic Chain-of-Thought Completionhttp://arxiv.org/abs/2605.15016v1http://arxiv.org/abs/2605.15016v1Zihan Deng et al. — arxiv:2605.15016 — Multi-AgentThu, 14 May 2026 00:00:00 GMTMulti-AgentGraphFlow: An Architecture for Formally Verifiable Visual Workflows Enabling Reliable Agentic AI Automationhttp://arxiv.org/abs/2605.14968v1http://arxiv.org/abs/2605.14968v1Drewry H. Morris et al. — arxiv:2605.14968 — Multi-AgentThu, 14 May 2026 00:00:00 GMTMulti-AgentChrono-Gymnasium: An Open-Source, Gymnasium-Compatible Distributed Simulation Frameworkhttp://arxiv.org/abs/2605.14911v1http://arxiv.org/abs/2605.14911v1Bocheng Zou et al. — arxiv:2605.14911 — Multi-AgentThu, 14 May 2026 00:00:00 GMTMulti-AgentIs Grep All You Need? How Agent Harnesses Reshape Agentic Searchhttp://arxiv.org/abs/2605.15184v1http://arxiv.org/abs/2605.15184v1Sahil Sen et al. — arxiv:2605.15184 — RAGThu, 14 May 2026 00:00:00 GMTRAGWhy Neighborhoods Matter: Traversal Context and Provenance in Agentic GraphRAGhttp://arxiv.org/abs/2605.15109v1http://arxiv.org/abs/2605.15109v1Riccardo Terrenzi et al. — arxiv:2605.15109 — RAGThu, 14 May 2026 00:00:00 GMTRAGFrom Scenes to Elements: Multi-Granularity Evidence Retrieval for Verifiable Multimodal RAGhttp://arxiv.org/abs/2605.15019v1http://arxiv.org/abs/2605.15019v1Guanhua Chen et al. — arxiv:2605.15019 — RAGThu, 14 May 2026 00:00:00 GMTRAGEmotion-Attended Stateful Memory (EASM):The Architecture for Hyper-Personalization at Scalehttp://arxiv.org/abs/2605.14833v1http://arxiv.org/abs/2605.14833v1Vineet Kotecha et al. — arxiv:2605.14833 — RAGThu, 14 May 2026 00:00:00 GMTRAGAI-assisted cultural heritage dissemination: Comparing NMT and glossary-augmented LLM translation in rock art documentshttp://arxiv.org/abs/2605.14679v1http://arxiv.org/abs/2605.14679v1Vicent Briva-Iglesias et al. — arxiv:2605.14679 — RAGThu, 14 May 2026 00:00:00 GMTRAGFalkor-IRAC: Graph-Constrained Generation for Verified Legal Reasoning in Indian Judicial AIhttp://arxiv.org/abs/2605.14665v1http://arxiv.org/abs/2605.14665v1Joy Bose et al. — arxiv:2605.14665 — RAGThu, 14 May 2026 00:00:00 GMTRAGA Picture is Worth a Thousand Words? An Empirical Study of Aggregation Strategies for Visual Financial Document Retrievalhttp://arxiv.org/abs/2605.14581v1http://arxiv.org/abs/2605.14581v1Ho Hung Lim et al. — arxiv:2605.14581 — RAGThu, 14 May 2026 00:00:00 GMTRAGNot All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Taskshttp://arxiv.org/abs/2605.14503v1http://arxiv.org/abs/2605.14503v1Qiang Ke et al. — arxiv:2605.14503 — RAGThu, 14 May 2026 00:00:00 GMTRAGDeepchecks: Evaluating Retrieval-Augmented Generation (RAG)http://arxiv.org/abs/2605.14488v1http://arxiv.org/abs/2605.14488v1Assaf Gerner et al. — arxiv:2605.14488 — RAGThu, 14 May 2026 00:00:00 GMTRAGWhen Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Contexthttp://arxiv.org/abs/2605.14478v1http://arxiv.org/abs/2605.14478v1Haojun Weng et al. — arxiv:2605.14478 — RAGThu, 14 May 2026 00:00:00 GMTRAGOpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregationhttp://arxiv.org/abs/2605.15177v1http://arxiv.org/abs/2605.15177v1Shang Zhou et al. — arxiv:2605.15177 — ReasoningThu, 14 May 2026 00:00:00 GMTReasoningPelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Actionhttp://arxiv.org/abs/2605.15153v1http://arxiv.org/abs/2605.15153v1Yi Zhang et al. — arxiv:2605.15153 — ReasoningThu, 14 May 2026 00:00:00 GMTReasoningNatural Synthesis: Outperforming Reactive Synthesis Tools with Large Reasoning Modelshttp://arxiv.org/abs/2605.15131v1http://arxiv.org/abs/2605.15131v1Frederik Schmitt et al. — arxiv:2605.15131 — ReasoningThu, 14 May 2026 00:00:00 GMTReasoningCOTCAgent: Preventive Consultation via Probabilistic Chain-of-Thought Completionhttp://arxiv.org/abs/2605.15016v1http://arxiv.org/abs/2605.15016v1Zihan Deng et al. — arxiv:2605.15016 — ReasoningThu, 14 May 2026 00:00:00 GMTReasoningBoosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidancehttp://arxiv.org/abs/2605.15012v1http://arxiv.org/abs/2605.15012v1Kai Yan et al. — arxiv:2605.15012 — ReasoningThu, 14 May 2026 00:00:00 GMTReasoningInfoSFT: Learn More and Forget Less with Information-Aware Token Weightinghttp://arxiv.org/abs/2605.14967v1http://arxiv.org/abs/2605.14967v1Mahdi Sabbaghi et al. — arxiv:2605.14967 — ReasoningThu, 14 May 2026 00:00:00 GMTReasoningSteerSeg: Attention Steering for Reasoning Video Segmentationhttp://arxiv.org/abs/2605.14908v1http://arxiv.org/abs/2605.14908v1Ali Cheraghian et al. — arxiv:2605.14908 — ReasoningThu, 14 May 2026 00:00:00 GMTReasoningExploring Vision-Language Models for Online Signature Verification: A Zero-Shot Capability Studyhttp://arxiv.org/abs/2605.14845v1http://arxiv.org/abs/2605.14845v1Marta Robledo-Moreno et al. — arxiv:2605.14845 — ReasoningThu, 14 May 2026 00:00:00 GMTReasoningCOAL: Counterfactual and Observation-Enhanced Alignment Learning for Discriminative Referring Multi-Object Trackinghttp://arxiv.org/abs/2605.14795v1http://arxiv.org/abs/2605.14795v1Shukun Jia et al. — arxiv:2605.14795 — ReasoningThu, 14 May 2026 00:00:00 GMTReasoningVideo-Zero: Self-Evolution Video Understandinghttp://arxiv.org/abs/2605.14733v1http://arxiv.org/abs/2605.14733v1Ruixu Zhang et al. — arxiv:2605.14733 — ReasoningThu, 14 May 2026 00:00:00 GMTReasoningHand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correctionhttp://arxiv.org/abs/2605.15157v1http://arxiv.org/abs/2605.15157v1Zhuohang Li et al. — arxiv:2605.15157 — Tool UseThu, 14 May 2026 00:00:00 GMTTool UseUnderstanding How International Students in the U.S. Are Using Conversational AI to Support Cross-Cultural Adaptationhttp://arxiv.org/abs/2605.15127v1http://arxiv.org/abs/2605.15127v1Laleh Nourian et al. — arxiv:2605.15127 — Tool UseThu, 14 May 2026 00:00:00 GMTTool UseFrom Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agentshttp://arxiv.org/abs/2605.15104v1http://arxiv.org/abs/2605.15104v1Md Tahmid Rahman Laskar et al. — arxiv:2605.15104 — Tool UseThu, 14 May 2026 00:00:00 GMTTool UseConcurrency without Model Changes: Future-based Asynchronous Function Calling for LLMshttp://arxiv.org/abs/2605.15077v1http://arxiv.org/abs/2605.15077v1Guangyu Feng et al. — arxiv:2605.15077 — Tool UseThu, 14 May 2026 00:00:00 GMTTool UseCase-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Usehttp://arxiv.org/abs/2605.15041v1http://arxiv.org/abs/2605.15041v1Renning Pang et al. — arxiv:2605.15041 — Tool UseThu, 14 May 2026 00:00:00 GMTTool UseOrchard: An Open-Source Agentic Modeling Frameworkhttp://arxiv.org/abs/2605.15040v1http://arxiv.org/abs/2605.15040v1Baolin Peng et al. — arxiv:2605.15040 — Tool UseThu, 14 May 2026 00:00:00 GMTTool UseToward Securing AI Agents Like Operating Systemshttp://arxiv.org/abs/2605.14932v1http://arxiv.org/abs/2605.14932v1Lukas Pirch et al. — arxiv:2605.14932 — Tool UseThu, 14 May 2026 00:00:00 GMTTool UseBeyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systemshttp://arxiv.org/abs/2605.14892v1http://arxiv.org/abs/2605.14892v1Shihao Qi et al. — arxiv:2605.14892 — Tool UseThu, 14 May 2026 00:00:00 GMTTool UseSmartWalkCoach: An AI Companion for End-to-End Walking Guidance, Motivation, and Reflectionhttp://arxiv.org/abs/2605.14628v1http://arxiv.org/abs/2605.14628v1Xianzhe Zhang et al. — arxiv:2605.14628 — Tool UseThu, 14 May 2026 00:00:00 GMTTool UsePrompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with Iterative Distillation of Experiencehttp://arxiv.org/abs/2605.14443v1http://arxiv.org/abs/2605.14443v1Krishna Sayana et al. — arxiv:2605.14443 — Tool UseThu, 14 May 2026 00:00:00 GMTTool UseDoes Synthetic Layered Design Data Benefit Layered Design Decomposition?http://arxiv.org/abs/2605.15167v1http://arxiv.org/abs/2605.15167v1Kam Man Wu et al. — arxiv:2605.15167 — Multimodal LLMThu, 14 May 2026 00:00:00 GMTMultimodal LLMPelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Actionhttp://arxiv.org/abs/2605.15153v1http://arxiv.org/abs/2605.15153v1Yi Zhang et al. — arxiv:2605.15153 — Multimodal LLMThu, 14 May 2026 00:00:00 GMTMultimodal LLMMemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memoryhttp://arxiv.org/abs/2605.15128v1http://arxiv.org/abs/2605.15128v1Minghao Guo et al. — arxiv:2605.15128 — Multimodal LLMThu, 14 May 2026 00:00:00 GMTMultimodal LLMOn the Cultural Anachronism and Temporal Reasoning in Vision Language Modelshttp://arxiv.org/abs/2605.15071v1http://arxiv.org/abs/2605.15071v1Mukul Ranjan et al. — arxiv:2605.15071 — Multimodal LLMThu, 14 May 2026 00:00:00 GMTMultimodal LLMLATERN: Test-Time Context-Aware Explainable Video Anomaly Detectionhttp://arxiv.org/abs/2605.15054v1http://arxiv.org/abs/2605.15054v1Mitchell Piehl et al. — arxiv:2605.15054 — Multimodal LLMThu, 14 May 2026 00:00:00 GMTMultimodal LLMCompositional Video Generation via Inference-Time Guidancehttp://arxiv.org/abs/2605.14988v1http://arxiv.org/abs/2605.14988v1Ariel Shaulov et al. — arxiv:2605.14988 — Multimodal LLMThu, 14 May 2026 00:00:00 GMTMultimodal LLMMHSA: A Lightweight Framework for Mitigating Hallucinations via Steered Attention in LVLMshttp://arxiv.org/abs/2605.14966v1http://arxiv.org/abs/2605.14966v1Wei Ding et al. — arxiv:2605.14966 — Multimodal LLMThu, 14 May 2026 00:00:00 GMTMultimodal LLMOctopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Modelshttp://arxiv.org/abs/2605.14938v1http://arxiv.org/abs/2605.14938v1Yuehao Liu et al. — arxiv:2605.14938 — Multimodal LLMThu, 14 May 2026 00:00:00 GMTMultimodal LLMChain-of-Procedure: Hierarchical Visual-Language Reasoning for Procedural QAhttp://arxiv.org/abs/2605.14928v1http://arxiv.org/abs/2605.14928v1Guanhua Chen et al. — arxiv:2605.14928 — Multimodal LLMThu, 14 May 2026 00:00:00 GMTMultimodal LLMSceneParser: Hierarchical Scene Parsing for Visual Semantics Understandinghttp://arxiv.org/abs/2605.14923v1http://arxiv.org/abs/2605.14923v1Pengxin Xu et al. — arxiv:2605.14923 — Multimodal LLMThu, 14 May 2026 00:00:00 GMTMultimodal LLMSANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformerhttp://arxiv.org/abs/2605.15178v1http://arxiv.org/abs/2605.15178v1Haoyi Zhu et al. — arxiv:2605.15178 — Long ContextThu, 14 May 2026 00:00:00 GMTLong ContextSelf-Distilled Agentic Reinforcement Learninghttp://arxiv.org/abs/2605.15155v1http://arxiv.org/abs/2605.15155v1Zhengxi Lu et al. — arxiv:2605.15155 — Long ContextThu, 14 May 2026 00:00:00 GMTLong ContextSignificant or Not? The Impact of Randomisation During Data Reduction on Confirming a New Pulsating Ultraluminous X-ray Source Candidate in Centaurus Ahttp://arxiv.org/abs/2605.15137v1http://arxiv.org/abs/2605.15137v1Amy H. Knight et al. — arxiv:2605.15137 — Long ContextThu, 14 May 2026 00:00:00 GMTLong ContextImproving Multi-turn Dialogue Consistency with Self-Recall Thinkinghttp://arxiv.org/abs/2605.15102v1http://arxiv.org/abs/2605.15102v1Renning Pang et al. — arxiv:2605.15102 — Long ContextThu, 14 May 2026 00:00:00 GMTLong ContextSophie Germain, mathématicienne extraordinaire: A story stranger than fictionhttp://arxiv.org/abs/2605.15046v1http://arxiv.org/abs/2605.15046v1David Pengelley et al. — arxiv:2605.15046 — Long ContextThu, 14 May 2026 00:00:00 GMTLong ContextEverAnimate: Minute-Scale Human Animation via Latent Flow Restorationhttp://arxiv.org/abs/2605.15042v1http://arxiv.org/abs/2605.15042v1Wuyang Li et al. — arxiv:2605.15042 — Long ContextThu, 14 May 2026 00:00:00 GMTLong ContextSemaTune: Semantic-Aware Online OS Tuning with Large Language Modelshttp://arxiv.org/abs/2605.15026v1http://arxiv.org/abs/2605.15026v1Georgios Liargkovas et al. — arxiv:2605.15026 — Long ContextThu, 14 May 2026 00:00:00 GMTLong ContextImaging without visibilities: FAST-Effelsberg scintillometry of PSR B1508+55http://arxiv.org/abs/2605.15004v1http://arxiv.org/abs/2605.15004v1Tim Sprenger et al. — arxiv:2605.15004 — Long ContextThu, 14 May 2026 00:00:00 GMTLong ContextMemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Modelshttp://arxiv.org/abs/2605.14906v1http://arxiv.org/abs/2605.14906v1Xiyu Ren et al. — arxiv:2605.14906 — Long ContextThu, 14 May 2026 00:00:00 GMTLong ContextSurgicalMamba: Dual-Path SSD with State Regramming for Online Surgical Phase Recognitionhttp://arxiv.org/abs/2605.14889v1http://arxiv.org/abs/2605.14889v1Sukju Oh et al. — arxiv:2605.14889 — Long ContextThu, 14 May 2026 00:00:00 GMTLong ContextWarp-as-History: Generalizable Camera-Controlled Video Generation from One Training Videohttp://arxiv.org/abs/2605.15182v1http://arxiv.org/abs/2605.15182v1Yifan Wang et al. — arxiv:2605.15182 — LLM EfficiencyThu, 14 May 2026 00:00:00 GMTLLM EfficiencySANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformerhttp://arxiv.org/abs/2605.15178v1http://arxiv.org/abs/2605.15178v1Haoyi Zhu et al. — arxiv:2605.15178 — LLM EfficiencyThu, 14 May 2026 00:00:00 GMTLLM EfficiencyWidening the Gap: Exploiting LLM Quantization via Outlier Injectionhttp://arxiv.org/abs/2605.15152v1http://arxiv.org/abs/2605.15152v1Xiaohua Zhan et al. — arxiv:2605.15152 — LLM EfficiencyThu, 14 May 2026 00:00:00 GMTLLM EfficiencyExtensive long-range magic in non-Abelian topological ordershttp://arxiv.org/abs/2605.15150v1http://arxiv.org/abs/2605.15150v1Yuzhen Zhang et al. — arxiv:2605.15150 — LLM EfficiencyThu, 14 May 2026 00:00:00 GMTLLM EfficiencyForgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attributionhttp://arxiv.org/abs/2605.15138v1http://arxiv.org/abs/2605.15138v1Saisab Sadhu et al. — arxiv:2605.15138 — LLM EfficiencyThu, 14 May 2026 00:00:00 GMTLLM EfficiencyEverAnimate: Minute-Scale Human Animation via Latent Flow Restorationhttp://arxiv.org/abs/2605.15042v1http://arxiv.org/abs/2605.15042v1Wuyang Li et al. — arxiv:2605.15042 — LLM EfficiencyThu, 14 May 2026 00:00:00 GMTLLM EfficiencyImpurity-induced geometric correlations and fractional quantization in quantum Hall systemshttp://arxiv.org/abs/2605.15022v1http://arxiv.org/abs/2605.15022v1M. A. Hidalgo et al. — arxiv:2605.15022 — LLM EfficiencyThu, 14 May 2026 00:00:00 GMTLLM EfficiencyACE-LoRA: Adaptive Orthogonal Decoupling for Continual Image Editinghttp://arxiv.org/abs/2605.14948v1http://arxiv.org/abs/2605.14948v1Yuehao Liu et al. — arxiv:2605.14948 — LLM EfficiencyThu, 14 May 2026 00:00:00 GMTLLM EfficiencyNot All Symbols Are Equal: Importance-Aware Constellation Design for Semantic Communicationhttp://arxiv.org/abs/2605.14940v1http://arxiv.org/abs/2605.14940v1Albert Shaju et al. — arxiv:2605.14940 — LLM EfficiencyThu, 14 May 2026 00:00:00 GMTLLM EfficiencyA Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Modelshttp://arxiv.org/abs/2605.14929v1http://arxiv.org/abs/2605.14929v1Earl Killian et al. — arxiv:2605.14929 — LLM EfficiencyThu, 14 May 2026 00:00:00 GMTLLM EfficiencyFrom Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreementhttp://arxiv.org/abs/2605.14912v1http://arxiv.org/abs/2605.14912v1Varad Vishwarupe et al. — arxiv:2605.14912 — AlignmentThu, 14 May 2026 00:00:00 GMTAlignmentHierarchical Image Tokenization for Multi-Scale Image Super Resolutionhttp://arxiv.org/abs/2605.14891v1http://arxiv.org/abs/2605.14891v1Isma Hadji et al. — arxiv:2605.14891 — AlignmentThu, 14 May 2026 00:00:00 GMTAlignmentWhen Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretabilityhttp://arxiv.org/abs/2605.15183v1http://arxiv.org/abs/2605.15183v1ML Nissen Gonzalez et al. — arxiv:2605.15183 — HallucinationThu, 14 May 2026 00:00:00 GMTHallucinationText Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignmenthttp://arxiv.org/abs/2605.15168v1http://arxiv.org/abs/2605.15168v1Sayantan Kumar et al. — arxiv:2605.15168 — HallucinationThu, 14 May 2026 00:00:00 GMTHallucinationWhy Neighborhoods Matter: Traversal Context and Provenance in Agentic GraphRAGhttp://arxiv.org/abs/2605.15109v1http://arxiv.org/abs/2605.15109v1Riccardo Terrenzi et al. — arxiv:2605.15109 — HallucinationThu, 14 May 2026 00:00:00 GMTHallucinationDual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scalinghttp://arxiv.org/abs/2605.15100v1http://arxiv.org/abs/2605.15100v1Rongman Xu et al. — arxiv:2605.15100 — HallucinationThu, 14 May 2026 00:00:00 GMTHallucinationCase-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Usehttp://arxiv.org/abs/2605.15041v1http://arxiv.org/abs/2605.15041v1Renning Pang et al. — arxiv:2605.15041 — HallucinationThu, 14 May 2026 00:00:00 GMTHallucinationCOTCAgent: Preventive Consultation via Probabilistic Chain-of-Thought Completionhttp://arxiv.org/abs/2605.15016v1http://arxiv.org/abs/2605.15016v1Zihan Deng et al. — arxiv:2605.15016 — HallucinationThu, 14 May 2026 00:00:00 GMTHallucinationKadison's problem for trace-vector orthonormal bases in $\mathrm{II}_1$ factors with separable predualhttp://arxiv.org/abs/2605.15006v1http://arxiv.org/abs/2605.15006v1Yixin He et al. — arxiv:2605.15006 — HallucinationThu, 14 May 2026 00:00:00 GMTHallucinationCompositional Video Generation via Inference-Time Guidancehttp://arxiv.org/abs/2605.14988v1http://arxiv.org/abs/2605.14988v1Ariel Shaulov et al. — arxiv:2605.14988 — HallucinationThu, 14 May 2026 00:00:00 GMTHallucinationMHSA: A Lightweight Framework for Mitigating Hallucinations via Steered Attention in LVLMshttp://arxiv.org/abs/2605.14966v1http://arxiv.org/abs/2605.14966v1Wei Ding et al. — arxiv:2605.14966 — HallucinationThu, 14 May 2026 00:00:00 GMTHallucinationUnlocking Complex Visual Generation via Closed-Loop Verified Reasoninghttp://arxiv.org/abs/2605.14876v1http://arxiv.org/abs/2605.14876v1Hanbo Cheng et al. — arxiv:2605.14876 — HallucinationThu, 14 May 2026 00:00:00 GMTHallucinationPosition: Behavioural Assurance Cannot Verify the Safety Claims Governance Now Demandshttp://arxiv.org/abs/2605.15164v1http://arxiv.org/abs/2605.15164v1Pratinav Seth et al. — arxiv:2605.15164 — LLM SafetyThu, 14 May 2026 00:00:00 GMTLLM SafetyWARD: Adversarially Robust Defense of Web Agents Against Prompt Injectionshttp://arxiv.org/abs/2605.15030v1http://arxiv.org/abs/2605.15030v1Tri Cao et al. — arxiv:2605.15030 — LLM SafetyThu, 14 May 2026 00:00:00 GMTLLM SafetyFast Adversarial Attacks with Gradient Predictionhttp://arxiv.org/abs/2605.14868v1http://arxiv.org/abs/2605.14868v1Kamil Ciosek et al. — arxiv:2605.14868 — LLM SafetyThu, 14 May 2026 00:00:00 GMTLLM SafetyEVA: Editing for Versatile Alignment against Jailbreakshttp://arxiv.org/abs/2605.14750v1http://arxiv.org/abs/2605.14750v1Yi Wang et al. — arxiv:2605.14750 — LLM SafetyThu, 14 May 2026 00:00:00 GMTLLM SafetyThe Great Pretender: A Stochasticity Problem in LLM Jailbreakhttp://arxiv.org/abs/2605.14418v1http://arxiv.org/abs/2605.14418v1Jean-Philippe Monteuuis et al. — arxiv:2605.14418 — LLM SafetyThu, 14 May 2026 00:00:00 GMTLLM SafetyGuided Diffusion Sampling for Precipitation Forecast Interventionshttp://arxiv.org/abs/2605.14317v1http://arxiv.org/abs/2605.14317v1Ayumu Ueyama et al. — arxiv:2605.14317 — LLM SafetyThu, 14 May 2026 00:00:00 GMTLLM SafetyOpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregationhttp://arxiv.org/abs/2605.15177v1http://arxiv.org/abs/2605.15177v1Shang Zhou et al. — arxiv:2605.15177 — LLM EvaluationThu, 14 May 2026 00:00:00 GMTLLM EvaluationFrom Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agentshttp://arxiv.org/abs/2605.15104v1http://arxiv.org/abs/2605.15104v1Md Tahmid Rahman Laskar et al. — arxiv:2605.15104 — LLM EvaluationThu, 14 May 2026 00:00:00 GMTLLM EvaluationSmall, Private Language Models as Teammates for Educational Assessment Designhttp://arxiv.org/abs/2605.15015v1http://arxiv.org/abs/2605.15015v1Chris Davis Jaldi et al. — arxiv:2605.15015 — LLM EvaluationThu, 14 May 2026 00:00:00 GMTLLM EvaluationGraphs of Research: Citation Evolution Graphs as Supervision for Research Idea Generationhttp://arxiv.org/abs/2605.14790v1http://arxiv.org/abs/2605.14790v1Songyang Gao et al. — arxiv:2605.14790 — LLM EvaluationThu, 14 May 2026 00:00:00 GMTLLM EvaluationTeaching Large Language Models When Not to Know: Learning Temporal Critique for Ex-Ante Reasoninghttp://arxiv.org/abs/2605.14636v1http://arxiv.org/abs/2605.14636v1Chenlu Ding et al. — arxiv:2605.14636 — LLM EvaluationThu, 14 May 2026 00:00:00 GMTLLM EvaluationMultiEmo-Bench: Multi-label Visual Emotion Analysis for Multi-modal Large Language Modelshttp://arxiv.org/abs/2605.14635v1http://arxiv.org/abs/2605.14635v1Tianwei Chen et al. — arxiv:2605.14635 — LLM EvaluationThu, 14 May 2026 00:00:00 GMTLLM EvaluationSycophancy is an Educational Safety Risk: Why LLM Tutors Need Sycophancy Benchmarkshttp://arxiv.org/abs/2605.14604v1http://arxiv.org/abs/2605.14604v1Enkelejda Kasneci et al. — arxiv:2605.14604 — LLM EvaluationThu, 14 May 2026 00:00:00 GMTLLM EvaluationMining Subscenario Refactoring Opportunities in Behaviour-Driven Software Test Suites: ML Classifiers and LLM-Judge Baselineshttp://arxiv.org/abs/2605.14568v1http://arxiv.org/abs/2605.14568v1Ali Hassaan Mughal et al. — arxiv:2605.14568 — LLM EvaluationThu, 14 May 2026 00:00:00 GMTLLM EvaluationThe Great Pretender: A Stochasticity Problem in LLM Jailbreakhttp://arxiv.org/abs/2605.14418v1http://arxiv.org/abs/2605.14418v1Jean-Philippe Monteuuis et al. — arxiv:2605.14418 — LLM EvaluationThu, 14 May 2026 00:00:00 GMTLLM EvaluationLatency-Quality Routing for Functionally Equivalent Tools in LLM Agentshttp://arxiv.org/abs/2605.14241v1http://arxiv.org/abs/2605.14241v1Kexin Chu et al. — arxiv:2605.14241 — LLM EvaluationThu, 14 May 2026 00:00:00 GMTLLM EvaluationLearning from Language Feedback via Variational Policy Distillationhttp://arxiv.org/abs/2605.15113v1http://arxiv.org/abs/2605.15113v1Yang Li et al. — arxiv:2605.15113 — Code LLMThu, 14 May 2026 00:00:00 GMTCode LLMAdapting AlphaEvolve to Optimize Fully Homomorphic Encryption on TPUshttp://arxiv.org/abs/2605.14718v1http://arxiv.org/abs/2605.14718v1Shruthi Gorantala et al. — arxiv:2605.14718 — Code LLMThu, 14 May 2026 00:00:00 GMTCode LLMLearning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewardshttp://arxiv.org/abs/2605.14539v1http://arxiv.org/abs/2605.14539v1Mengjie Ren et al. — arxiv:2605.14539 — Code LLMThu, 14 May 2026 00:00:00 GMTCode LLMNot All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Taskshttp://arxiv.org/abs/2605.14503v1http://arxiv.org/abs/2605.14503v1Qiang Ke et al. — arxiv:2605.14503 — Code LLMThu, 14 May 2026 00:00:00 GMTCode LLMWhen Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Contexthttp://arxiv.org/abs/2605.14478v1http://arxiv.org/abs/2605.14478v1Haojun Weng et al. — arxiv:2605.14478 — Code LLMThu, 14 May 2026 00:00:00 GMTCode LLMTest-Time Learning with an Evolving Libraryhttp://arxiv.org/abs/2605.14477v1http://arxiv.org/abs/2605.14477v1Weijia Xu et al. — arxiv:2605.14477 — Code LLMThu, 14 May 2026 00:00:00 GMTCode LLMFuzzAgent: Multi-Agent System for Evolutionary Library Fuzzinghttp://arxiv.org/abs/2605.14431v1http://arxiv.org/abs/2605.14431v1Yunlong Lyu et al. — arxiv:2605.14431 — Code LLMThu, 14 May 2026 00:00:00 GMTCode LLMCoding Agent Is Good As World Simulatorhttp://arxiv.org/abs/2605.14398v1http://arxiv.org/abs/2605.14398v1Hongyu Wang et al. — arxiv:2605.14398 — Code LLMThu, 14 May 2026 00:00:00 GMTCode LLMGenCircuit-RL: Reinforcement Learning from Hierarchical Verification for Genetic Circuit Designhttp://arxiv.org/abs/2605.14215v1http://arxiv.org/abs/2605.14215v1Noah Flynn et al. — arxiv:2605.14215 — Code LLMThu, 14 May 2026 00:00:00 GMTCode LLMPosition: Behavioural Assurance Cannot Verify the Safety Claims Governance Now Demandshttp://arxiv.org/abs/2605.15164v1http://arxiv.org/abs/2605.15164v1Pratinav Seth et al. — arxiv:2605.15164 — Legal NLPThu, 14 May 2026 00:00:00 GMTLegal NLPTokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Studyhttp://arxiv.org/abs/2605.14890v1http://arxiv.org/abs/2605.14890v1Volodymyr Ovcharov et al. — arxiv:2605.14890 — Legal NLPThu, 14 May 2026 00:00:00 GMTLegal NLPQuantifying and Mitigating Premature Closure in Frontier LLMshttp://arxiv.org/abs/2605.15000v1http://arxiv.org/abs/2605.15000v1Rebecca Handler et al. — arxiv:2605.15000 — Medical NLPThu, 14 May 2026 00:00:00 GMTMedical NLPML-Embed: Inclusive and Efficient Embeddings for a Multilingual Worldhttp://arxiv.org/abs/2605.15081v1http://arxiv.org/abs/2605.15081v1Ziyin Zhang et al. — arxiv:2605.15081 — Multilingual NLPThu, 14 May 2026 00:00:00 GMTMultilingual NLPStyleTextGen: Style-Conditioned Multilingual Scene Text Generationhttp://arxiv.org/abs/2605.14708v1http://arxiv.org/abs/2605.14708v1Zeyu Chen et al. — arxiv:2605.14708 — Multilingual NLPThu, 14 May 2026 00:00:00 GMTMultilingual NLPReinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Taxhttp://arxiv.org/abs/2605.14366v1http://arxiv.org/abs/2605.14366v1Zeli Su et al. — arxiv:2605.14366 — Multilingual NLPThu, 14 May 2026 00:00:00 GMTMultilingual NLPA Climate-Constrained Bayesian Inverse Method for JWST Rocky Exoplanet Eclipse Spectra: A Case Study of LTT 1445A bhttp://arxiv.org/abs/2605.14997v1http://arxiv.org/abs/2605.14997v1Nicholas Wogan et al. — arxiv:2605.14997 — Information ExtractionThu, 14 May 2026 00:00:00 GMTInformation ExtractionMemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Modelshttp://arxiv.org/abs/2605.14906v1http://arxiv.org/abs/2605.14906v1Xiyu Ren et al. — arxiv:2605.14906 — Information ExtractionThu, 14 May 2026 00:00:00 GMTInformation ExtractionDT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health Systemhttp://arxiv.org/abs/2605.14227v1http://arxiv.org/abs/2605.14227v1Yunying Zhu et al. — arxiv:2605.14227 — Text ClassificationThu, 14 May 2026 00:00:00 GMTText ClassificationSelf-Distilled Agentic Reinforcement Learninghttp://arxiv.org/abs/2605.15155v1http://arxiv.org/abs/2605.15155v1Zhengxi Lu et al. — arxiv:2605.15155 — Question AnsweringThu, 14 May 2026 00:00:00 GMTQuestion AnsweringQuantifying and Mitigating Premature Closure in Frontier LLMshttp://arxiv.org/abs/2605.15000v1http://arxiv.org/abs/2605.15000v1Rebecca Handler et al. — arxiv:2605.15000 — Question AnsweringThu, 14 May 2026 00:00:00 GMTQuestion AnsweringChain-of-Procedure: Hierarchical Visual-Language Reasoning for Procedural QAhttp://arxiv.org/abs/2605.14928v1http://arxiv.org/abs/2605.14928v1Guanhua Chen et al. — arxiv:2605.14928 — Question AnsweringThu, 14 May 2026 00:00:00 GMTQuestion AnsweringCOREKG: Coreset-Guided Personalized Summarization of Knowledge Graphshttp://arxiv.org/abs/2605.14900v1http://arxiv.org/abs/2605.14900v1Sohel Aman Khan et al. — arxiv:2605.14900 — Question AnsweringThu, 14 May 2026 00:00:00 GMTQuestion AnsweringA Heterogeneous Temporal Memory Governance Framework for Long-Term LLM Persona Consistencyhttp://arxiv.org/abs/2605.14802v1http://arxiv.org/abs/2605.14802v1Zhao Yang et al. — arxiv:2605.14802 — Question AnsweringThu, 14 May 2026 00:00:00 GMTQuestion AnsweringVideo-Zero: Self-Evolution Video Understandinghttp://arxiv.org/abs/2605.14733v1http://arxiv.org/abs/2605.14733v1Ruixu Zhang et al. — arxiv:2605.14733 — Question AnsweringThu, 14 May 2026 00:00:00 GMTQuestion AnsweringFrom Table to Cell: Attention for Better Reasoning with TABALIGNhttp://arxiv.org/abs/2605.14465v1http://arxiv.org/abs/2605.14465v1Tung Sum Thomas Kwok et al. — arxiv:2605.14465 — Question AnsweringThu, 14 May 2026 00:00:00 GMTQuestion AnsweringWhen Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decompositionhttp://arxiv.org/abs/2605.14449v1http://arxiv.org/abs/2605.14449v1Siyang Yao et al. — arxiv:2605.14449 — Question AnsweringThu, 14 May 2026 00:00:00 GMTQuestion AnsweringUncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traceshttp://arxiv.org/abs/2605.14358v1http://arxiv.org/abs/2605.14358v1Sanjoy Chowdhury et al. — arxiv:2605.14358 — Question AnsweringThu, 14 May 2026 00:00:00 GMTQuestion AnsweringHerculean: An Agentic Benchmark for Financial Intelligencehttp://arxiv.org/abs/2605.14355v1http://arxiv.org/abs/2605.14355v1Xueqing Peng et al. — arxiv:2605.14355 — Question AnsweringThu, 14 May 2026 00:00:00 GMTQuestion AnsweringPrecise Verification of Transformers through ReLU-Catalyzed Abstraction Refinementhttp://arxiv.org/abs/2605.14294v1http://arxiv.org/abs/2605.14294v1Hengjie Liu et al. — arxiv:2605.14294 — Sentiment AnalysisThu, 14 May 2026 00:00:00 GMTSentiment AnalysisWhy Neighborhoods Matter: Traversal Context and Provenance in Agentic GraphRAGhttp://arxiv.org/abs/2605.15109v1http://arxiv.org/abs/2605.15109v1Riccardo Terrenzi et al. — arxiv:2605.15109 — Knowledge GraphThu, 14 May 2026 00:00:00 GMTKnowledge GraphKGPFN: Unlocking the Potential of Knowledge Graph Foundation Model via In-Context Learninghttp://arxiv.org/abs/2605.14907v1http://arxiv.org/abs/2605.14907v1Yisen Gao et al. — arxiv:2605.14907 — Knowledge GraphThu, 14 May 2026 00:00:00 GMTKnowledge GraphCOREKG: Coreset-Guided Personalized Summarization of Knowledge Graphshttp://arxiv.org/abs/2605.14900v1http://arxiv.org/abs/2605.14900v1Sohel Aman Khan et al. — arxiv:2605.14900 — Knowledge GraphThu, 14 May 2026 00:00:00 GMTKnowledge GraphFalkor-IRAC: Graph-Constrained Generation for Verified Legal Reasoning in Indian Judicial AIhttp://arxiv.org/abs/2605.14665v1http://arxiv.org/abs/2605.14665v1Joy Bose et al. — arxiv:2605.14665 — Knowledge GraphThu, 14 May 2026 00:00:00 GMTKnowledge GraphChildren's English Reading Story Generation via Supervised Fine-Tuning of Compact LLMs with Controllable Difficulty and Safetyhttp://arxiv.org/abs/2605.13709v1http://arxiv.org/abs/2605.13709v1Qian Shen et al. — arxiv:2605.13709 — NLPWed, 13 May 2026 00:00:00 GMTNLPPretraining Language Models with Subword Regularization: An Empirical Study of BPE Dropout in Low-Resource NLPhttp://arxiv.org/abs/2605.13436v1http://arxiv.org/abs/2605.13436v1Ruan Visser et al. — arxiv:2605.13436 — NLPWed, 13 May 2026 00:00:00 GMTNLPLLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metricshttp://arxiv.org/abs/2605.13412v1http://arxiv.org/abs/2605.13412v1Galadrielle Humblot-Renaux et al. — arxiv:2605.13412 — NLPWed, 13 May 2026 00:00:00 GMTNLPRetrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Educationhttp://arxiv.org/abs/2605.12988v1http://arxiv.org/abs/2605.12988v1Mragisha Jain et al. — arxiv:2605.12988 — NLPWed, 13 May 2026 00:00:00 GMTNLPRethinking Layer Relevance in Large Language Models Beyond Cosine Similarityhttp://arxiv.org/abs/2605.14075v1http://arxiv.org/abs/2605.14075v1Cristian Hinostroza et al. — arxiv:2605.14075 — NLPWed, 13 May 2026 00:00:00 GMTNLPWARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Datahttp://arxiv.org/abs/2605.13846v1http://arxiv.org/abs/2605.13846v1Ziheng Zhang et al. — arxiv:2605.13846 — LLMWed, 13 May 2026 00:00:00 GMTLLMGood Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weightshttp://arxiv.org/abs/2605.13839v1http://arxiv.org/abs/2605.13839v1Wenrui Bao et al. — arxiv:2605.13839 — LLMWed, 13 May 2026 00:00:00 GMTLLMNegation Neglect: When models fail to learn negations in traininghttp://arxiv.org/abs/2605.13829v1http://arxiv.org/abs/2605.13829v1Harry Mayne et al. — arxiv:2605.13829 — LLMWed, 13 May 2026 00:00:00 GMTLLMHistory Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actionshttp://arxiv.org/abs/2605.13825v1http://arxiv.org/abs/2605.13825v1Alberto G. Rodríguez Salgado et al. — arxiv:2605.13825 — LLMWed, 13 May 2026 00:00:00 GMTLLMNeurosymbolic Auditing of Natural-Language Software Requirementshttp://arxiv.org/abs/2605.13817v1http://arxiv.org/abs/2605.13817v1Bethel Hall et al. — arxiv:2605.13817 — LLMWed, 13 May 2026 00:00:00 GMTLLMImproving Reproducibility in Evaluation through Multi-Level Annotator Modelinghttp://arxiv.org/abs/2605.13801v1http://arxiv.org/abs/2605.13801v1Deepak Pandita et al. — arxiv:2605.13801 — LLMWed, 13 May 2026 00:00:00 GMTLLMAn LLM-Based System for Argument Reconstructionhttp://arxiv.org/abs/2605.13793v1http://arxiv.org/abs/2605.13793v1Paulo Pirozelli et al. — arxiv:2605.13793 — LLMWed, 13 May 2026 00:00:00 GMTLLMAttention Once Is All You Need: Efficient Streaming Inference with Stateful Transformershttp://arxiv.org/abs/2605.13784v1http://arxiv.org/abs/2605.13784v1Victor Norgren et al. — arxiv:2605.13784 — LLMWed, 13 May 2026 00:00:00 GMTLLMMinT: Managed Infrastructure for Training and Serving Millions of LLMshttp://arxiv.org/abs/2605.13779v1http://arxiv.org/abs/2605.13779v1Mind Lab et al. — arxiv:2605.13779 — LLMWed, 13 May 2026 00:00:00 GMTLLM"Like Taking the Path of Least Resistance": Exploring the Impact of LLM Interaction on the Creative Process of Programminghttp://arxiv.org/abs/2605.13776v1http://arxiv.org/abs/2605.13776v1Zeinabsadat Saghi et al. — arxiv:2605.13776 — LLMWed, 13 May 2026 00:00:00 GMTLLMEVA-Bench: A New End-to-end Framework for Evaluating Voice Agentshttp://arxiv.org/abs/2605.13841v1http://arxiv.org/abs/2605.13841v1Tara Bogavelli et al. — arxiv:2605.13841 — LLM AgentWed, 13 May 2026 00:00:00 GMTLLM AgentGood Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weightshttp://arxiv.org/abs/2605.13839v1http://arxiv.org/abs/2605.13839v1Wenrui Bao et al. — arxiv:2605.13839 — LLM AgentWed, 13 May 2026 00:00:00 GMTLLM AgentTraining Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Contexthttp://arxiv.org/abs/2605.13831v1http://arxiv.org/abs/2605.13831v1Zhaowei Wang et al. — arxiv:2605.13831 — LLM AgentWed, 13 May 2026 00:00:00 GMTLLM AgentHistory Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actionshttp://arxiv.org/abs/2605.13825v1http://arxiv.org/abs/2605.13825v1Alberto G. Rodríguez Salgado et al. — arxiv:2605.13825 — LLM AgentWed, 13 May 2026 00:00:00 GMTLLM AgentHarnessing Agentic Evolutionhttp://arxiv.org/abs/2605.13821v1http://arxiv.org/abs/2605.13821v1Jiayi Zhang et al. — arxiv:2605.13821 — LLM AgentWed, 13 May 2026 00:00:00 GMTLLM AgentEvoGround: Self-Evolving Video Agents for Video Temporal Groundinghttp://arxiv.org/abs/2605.13803v1http://arxiv.org/abs/2605.13803v1Minjoon Jung et al. — arxiv:2605.13803 — LLM AgentWed, 13 May 2026 00:00:00 GMTLLM AgentEconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environmentshttp://arxiv.org/abs/2605.13762v1http://arxiv.org/abs/2605.13762v1Annie Liu et al. — arxiv:2605.13762 — LLM AgentWed, 13 May 2026 00:00:00 GMTLLM AgentThe Co-evolution of Costly Signaling and Cooperation in Social Dilemmashttp://arxiv.org/abs/2605.13750v1http://arxiv.org/abs/2605.13750v1Mahdi Abolhasani et al. — arxiv:2605.13750 — LLM AgentWed, 13 May 2026 00:00:00 GMTLLM AgentLearning POMDP World Models from Observations with Language-Model Priorshttp://arxiv.org/abs/2605.13740v1http://arxiv.org/abs/2605.13740v1Valentin Six et al. — arxiv:2605.13740 — LLM AgentWed, 13 May 2026 00:00:00 GMTLLM AgentSenses Wide Shut: A Representation-Action Gap in Omnimodal LLMshttp://arxiv.org/abs/2605.13737v1http://arxiv.org/abs/2605.13737v1Trung Nguyen Quang et al. — arxiv:2605.13737 — LLM AgentWed, 13 May 2026 00:00:00 GMTLLM AgentEVA-Bench: A New End-to-end Framework for Evaluating Voice Agentshttp://arxiv.org/abs/2605.13841v1http://arxiv.org/abs/2605.13841v1Tara Bogavelli et al. — arxiv:2605.13841 — Multi-AgentWed, 13 May 2026 00:00:00 GMTMulti-AgentGood Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weightshttp://arxiv.org/abs/2605.13839v1http://arxiv.org/abs/2605.13839v1Wenrui Bao et al. — arxiv:2605.13839 — Multi-AgentWed, 13 May 2026 00:00:00 GMTMulti-AgentTraining Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Contexthttp://arxiv.org/abs/2605.13831v1http://arxiv.org/abs/2605.13831v1Zhaowei Wang et al. — arxiv:2605.13831 — Multi-AgentWed, 13 May 2026 00:00:00 GMTMulti-AgentScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profileshttp://arxiv.org/abs/2605.13725v1http://arxiv.org/abs/2605.13725v1Yitian Yang et al. — arxiv:2605.13725 — Multi-AgentWed, 13 May 2026 00:00:00 GMTMulti-AgentSkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystemshttp://arxiv.org/abs/2605.13716v1http://arxiv.org/abs/2605.13716v1Hongji Pu et al. — arxiv:2605.13716 — Multi-AgentWed, 13 May 2026 00:00:00 GMTMulti-AgentLearning Equilibria in Coordination Games via Minorization-Maximizationhttp://arxiv.org/abs/2605.13644v1http://arxiv.org/abs/2605.13644v1Ashok Krishnan K. S. et al. — arxiv:2605.13644 — Multi-AgentWed, 13 May 2026 00:00:00 GMTMulti-AgentOpenAaaS: An Open Agent-as-a-Service Framework for Distributed Materials-Informatics Researchhttp://arxiv.org/abs/2605.13618v1http://arxiv.org/abs/2605.13618v1Peng Kang et al. — arxiv:2605.13618 — Multi-AgentWed, 13 May 2026 00:00:00 GMTMulti-AgentSelf-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisationhttp://arxiv.org/abs/2605.13554v1http://arxiv.org/abs/2605.13554v1Asim Osman et al. — arxiv:2605.13554 — Multi-AgentWed, 13 May 2026 00:00:00 GMTMulti-AgentScaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merginghttp://arxiv.org/abs/2605.13534v1http://arxiv.org/abs/2605.13534v1Jiabei Liu et al. — arxiv:2605.13534 — Multi-AgentWed, 13 May 2026 00:00:00 GMTMulti-AgentMMSkills: Towards Multimodal Skills for General Visual Agentshttp://arxiv.org/abs/2605.13527v1http://arxiv.org/abs/2605.13527v1Kangning Zhang et al. — arxiv:2605.13527 — Multi-AgentWed, 13 May 2026 00:00:00 GMTMulti-AgentVectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defensehttp://arxiv.org/abs/2605.13764v1http://arxiv.org/abs/2605.13764v1Jascha Wanger et al. — arxiv:2605.13764 — RAGWed, 13 May 2026 00:00:00 GMTRAGOpenAaaS: An Open Agent-as-a-Service Framework for Distributed Materials-Informatics Researchhttp://arxiv.org/abs/2605.13618v1http://arxiv.org/abs/2605.13618v1Peng Kang et al. — arxiv:2605.13618 — RAGWed, 13 May 2026 00:00:00 GMTRAGPersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agentshttp://arxiv.org/abs/2605.13481v1http://arxiv.org/abs/2605.13481v1Mikhail Menschikov et al. — arxiv:2605.13481 — RAGWed, 13 May 2026 00:00:00 GMTRAGRS-Claw: Progressive Active Tool Exploration via Hierarchical Skill Trees for Remote Sensing Agentshttp://arxiv.org/abs/2605.13391v1http://arxiv.org/abs/2605.13391v1Liangtian Liu et al. — arxiv:2605.13391 — RAGWed, 13 May 2026 00:00:00 GMTRAGCANTANTE: Optimizing Agentic Systems via Contrastive Credit Attributionhttp://arxiv.org/abs/2605.13295v1http://arxiv.org/abs/2605.13295v1Tom Zehle et al. — arxiv:2605.13295 — RAGWed, 13 May 2026 00:00:00 GMTRAGUtility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.13277v1http://arxiv.org/abs/2605.13277v1Weiqing Luo et al. — arxiv:2605.13277 — RAGWed, 13 May 2026 00:00:00 GMTRAGAn Agentic AI Framework with Large Language Models and Chain-of-Thought for UAV-Assisted Logistics Scheduling with Mobile Edge Computinghttp://arxiv.org/abs/2605.13221v1http://arxiv.org/abs/2605.13221v1Hanwen Zhang et al. — arxiv:2605.13221 — RAGWed, 13 May 2026 00:00:00 GMTRAGPyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generationhttp://arxiv.org/abs/2605.13111v1http://arxiv.org/abs/2605.13111v1Jiayu Chen et al. — arxiv:2605.13111 — RAGWed, 13 May 2026 00:00:00 GMTRAGRAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Searchhttp://arxiv.org/abs/2605.13052v1http://arxiv.org/abs/2605.13052v1Tingyu Chen et al. — arxiv:2605.13052 — RAGWed, 13 May 2026 00:00:00 GMTRAGRetrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Educationhttp://arxiv.org/abs/2605.12988v1http://arxiv.org/abs/2605.12988v1Mragisha Jain et al. — arxiv:2605.12988 — RAGWed, 13 May 2026 00:00:00 GMTRAGA Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoninghttp://arxiv.org/abs/2605.13687v1http://arxiv.org/abs/2605.13687v1Jason Gaitonde et al. — arxiv:2605.13687 — ReasoningWed, 13 May 2026 00:00:00 GMTReasoningGuide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Modelshttp://arxiv.org/abs/2605.13632v1http://arxiv.org/abs/2605.13632v1Yiran Ling et al. — arxiv:2605.13632 — ReasoningWed, 13 May 2026 00:00:00 GMTReasoningMany-Shot CoT-ICL: Making In-Context Learning Truly Learnhttp://arxiv.org/abs/2605.13511v1http://arxiv.org/abs/2605.13511v1Tsz Ting Chung et al. — arxiv:2605.13511 — ReasoningWed, 13 May 2026 00:00:00 GMTReasoningInducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Modelshttp://arxiv.org/abs/2605.13338v1http://arxiv.org/abs/2605.13338v1Shuqiang Wang et al. — arxiv:2605.13338 — ReasoningWed, 13 May 2026 00:00:00 GMTReasoningAchieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scalinghttp://arxiv.org/abs/2605.13301v1http://arxiv.org/abs/2605.13301v1Yafu Li et al. — arxiv:2605.13301 — ReasoningWed, 13 May 2026 00:00:00 GMTReasoningWhat properties of reasoning supervision are associated with improved downstream model quality?http://arxiv.org/abs/2605.13290v1http://arxiv.org/abs/2605.13290v1Mikołaj Langner et al. — arxiv:2605.13290 — ReasoningWed, 13 May 2026 00:00:00 GMTReasoningRespecting Self-Uncertainty in On-Policy Self-Distillation for Efficient LLM Reasoninghttp://arxiv.org/abs/2605.13255v1http://arxiv.org/abs/2605.13255v1Junlong Ke et al. — arxiv:2605.13255 — ReasoningWed, 13 May 2026 00:00:00 GMTReasoningA Hybrid Framework for Natural Language Querying of IFC Models with Relational and Graph Representationshttp://arxiv.org/abs/2605.13236v1http://arxiv.org/abs/2605.13236v1Rabindra Lamsal et al. — arxiv:2605.13236 — ReasoningWed, 13 May 2026 00:00:00 GMTReasoningAn Agentic AI Framework with Large Language Models and Chain-of-Thought for UAV-Assisted Logistics Scheduling with Mobile Edge Computinghttp://arxiv.org/abs/2605.13221v1http://arxiv.org/abs/2605.13221v1Hanwen Zhang et al. — arxiv:2605.13221 — ReasoningWed, 13 May 2026 00:00:00 GMTReasoningSTOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimeshttp://arxiv.org/abs/2605.13165v1http://arxiv.org/abs/2605.13165v1Chenjun Xu et al. — arxiv:2605.13165 — ReasoningWed, 13 May 2026 00:00:00 GMTReasoningTraining Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Contexthttp://arxiv.org/abs/2605.13831v1http://arxiv.org/abs/2605.13831v1Zhaowei Wang et al. — arxiv:2605.13831 — Tool UseWed, 13 May 2026 00:00:00 GMTTool UsePorting the Nonlinear Optimization Library HiOp to Accelerator-Based Hardware Architectureshttp://arxiv.org/abs/2605.13736v1http://arxiv.org/abs/2605.13736v1Slaven Peles et al. — arxiv:2605.13736 — Tool UseWed, 13 May 2026 00:00:00 GMTTool UseReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Groundinghttp://arxiv.org/abs/2605.13228v1http://arxiv.org/abs/2605.13228v1Xiao Liu et al. — arxiv:2605.13228 — Tool UseWed, 13 May 2026 00:00:00 GMTTool UseWhen Does Hierarchy Help? Benchmarking Agent Coordination in Event-Driven Industrial Schedulinghttp://arxiv.org/abs/2605.13172v1http://arxiv.org/abs/2605.13172v1Ziqi Wang et al. — arxiv:2605.13172 — Tool UseWed, 13 May 2026 00:00:00 GMTTool UseUnlocking Patch-Level Features for CLIP-Based Class-Incremental Learninghttp://arxiv.org/abs/2605.13835v1http://arxiv.org/abs/2605.13835v1Hao Sun et al. — arxiv:2605.13835 — Multimodal LLMWed, 13 May 2026 00:00:00 GMTMultimodal LLMTraining Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Contexthttp://arxiv.org/abs/2605.13831v1http://arxiv.org/abs/2605.13831v1Zhaowei Wang et al. — arxiv:2605.13831 — Multimodal LLMWed, 13 May 2026 00:00:00 GMTMultimodal LLMRoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Datahttp://arxiv.org/abs/2605.13775v1http://arxiv.org/abs/2605.13775v1Harold Haodong Chen et al. — arxiv:2605.13775 — Multimodal LLMWed, 13 May 2026 00:00:00 GMTMultimodal LLMSceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Modelshttp://arxiv.org/abs/2605.13667v1http://arxiv.org/abs/2605.13667v1Vladislav Makarov et al. — arxiv:2605.13667 — Multimodal LLMWed, 13 May 2026 00:00:00 GMTMultimodal LLMProjGuard: Safety Monitoring for Computer-Use Agents via Low-Dimensional Projectionshttp://arxiv.org/abs/2605.13631v1http://arxiv.org/abs/2605.13631v1Kebin Contreras et al. — arxiv:2605.13631 — Multimodal LLMWed, 13 May 2026 00:00:00 GMTMultimodal LLMTowards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMshttp://arxiv.org/abs/2605.13530v1http://arxiv.org/abs/2605.13530v1Jincai Huang et al. — arxiv:2605.13530 — Multimodal LLMWed, 13 May 2026 00:00:00 GMTMultimodal LLMRotVLA: Rotational Latent Action for Vision-Language-Action Modelhttp://arxiv.org/abs/2605.13403v1http://arxiv.org/abs/2605.13403v1Qiwei Li et al. — arxiv:2605.13403 — Multimodal LLMWed, 13 May 2026 00:00:00 GMTMultimodal LLMRS-Claw: Progressive Active Tool Exploration via Hierarchical Skill Trees for Remote Sensing Agentshttp://arxiv.org/abs/2605.13391v1http://arxiv.org/abs/2605.13391v1Liangtian Liu et al. — arxiv:2605.13391 — Multimodal LLMWed, 13 May 2026 00:00:00 GMTMultimodal LLMGRIP-VLM: Group-Relative Importance Pruning for Efficient Vision-Language Modelshttp://arxiv.org/abs/2605.13375v1http://arxiv.org/abs/2605.13375v1Mingzhe Huang et al. — arxiv:2605.13375 — Multimodal LLMWed, 13 May 2026 00:00:00 GMTMultimodal LLMGeoFlowVLM: Geometry-Aware Joint Uncertainty for Frozen Vision-Language Embeddinghttp://arxiv.org/abs/2605.13352v1http://arxiv.org/abs/2605.13352v1Mayank Nautiyal et al. — arxiv:2605.13352 — Multimodal LLMWed, 13 May 2026 00:00:00 GMTMultimodal LLMQLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modelinghttp://arxiv.org/abs/2605.13833v1http://arxiv.org/abs/2605.13833v1Hoang-Quan Nguyen et al. — arxiv:2605.13833 — Long ContextWed, 13 May 2026 00:00:00 GMTLong ContextTraining Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Contexthttp://arxiv.org/abs/2605.13831v1http://arxiv.org/abs/2605.13831v1Zhaowei Wang et al. — arxiv:2605.13831 — Long ContextWed, 13 May 2026 00:00:00 GMTLong ContextHarnessing Agentic Evolutionhttp://arxiv.org/abs/2605.13821v1http://arxiv.org/abs/2605.13821v1Jiayi Zhang et al. — arxiv:2605.13821 — Long ContextWed, 13 May 2026 00:00:00 GMTLong ContextSceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Modelshttp://arxiv.org/abs/2605.13667v1http://arxiv.org/abs/2605.13667v1Vladislav Makarov et al. — arxiv:2605.13667 — Long ContextWed, 13 May 2026 00:00:00 GMTLong ContextRealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitationhttp://arxiv.org/abs/2605.13542v1http://arxiv.org/abs/2605.13542v1Chengzhi Shen et al. — arxiv:2605.13542 — Long ContextWed, 13 May 2026 00:00:00 GMTLong ContextGranite Embedding Multilingual R2 Modelshttp://arxiv.org/abs/2605.13521v1http://arxiv.org/abs/2605.13521v1Parul Awasthy et al. — arxiv:2605.13521 — Long ContextWed, 13 May 2026 00:00:00 GMTLong ContextMany-Shot CoT-ICL: Making In-Context Learning Truly Learnhttp://arxiv.org/abs/2605.13511v1http://arxiv.org/abs/2605.13511v1Tsz Ting Chung et al. — arxiv:2605.13511 — Long ContextWed, 13 May 2026 00:00:00 GMTLong ContextLongBEL: Long-Context and Document-Consistent Biomedical Entity Linkinghttp://arxiv.org/abs/2605.13451v1http://arxiv.org/abs/2605.13451v1Adam Remaki et al. — arxiv:2605.13451 — Long ContextWed, 13 May 2026 00:00:00 GMTLong ContextRS-Claw: Progressive Active Tool Exploration via Hierarchical Skill Trees for Remote Sensing Agentshttp://arxiv.org/abs/2605.13391v1http://arxiv.org/abs/2605.13391v1Liangtian Liu et al. — arxiv:2605.13391 — Long ContextWed, 13 May 2026 00:00:00 GMTLong ContextPhasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memoryhttp://arxiv.org/abs/2605.13370v1http://arxiv.org/abs/2605.13370v1Sungwoo Goo et al. — arxiv:2605.13370 — Long ContextWed, 13 May 2026 00:00:00 GMTLong ContextGood Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weightshttp://arxiv.org/abs/2605.13839v1http://arxiv.org/abs/2605.13839v1Wenrui Bao et al. — arxiv:2605.13839 — LLM EfficiencyWed, 13 May 2026 00:00:00 GMTLLM EfficiencyProvable Quantization with Randomized Hadamard Transformhttp://arxiv.org/abs/2605.13810v1http://arxiv.org/abs/2605.13810v1Ying Feng et al. — arxiv:2605.13810 — LLM EfficiencyWed, 13 May 2026 00:00:00 GMTLLM EfficiencyMinT: Managed Infrastructure for Training and Serving Millions of LLMshttp://arxiv.org/abs/2605.13779v1http://arxiv.org/abs/2605.13779v1Mind Lab et al. — arxiv:2605.13779 — LLM EfficiencyWed, 13 May 2026 00:00:00 GMTLLM EfficiencyHigh-Rate Quantized Matrix Multiplication IIhttp://arxiv.org/abs/2605.13768v1http://arxiv.org/abs/2605.13768v1Or Ordentlich et al. — arxiv:2605.13768 — LLM EfficiencyWed, 13 May 2026 00:00:00 GMTLLM EfficiencyDecoherence of spatial superpositions along stationary worldlineshttp://arxiv.org/abs/2605.13677v1http://arxiv.org/abs/2605.13677v1Clemens Jakubec et al. — arxiv:2605.13677 — LLM EfficiencyWed, 13 May 2026 00:00:00 GMTLLM EfficiencyLocale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Modelshttp://arxiv.org/abs/2605.13538v1http://arxiv.org/abs/2605.13538v1Anuj Sadani et al. — arxiv:2605.13538 — LLM EfficiencyWed, 13 May 2026 00:00:00 GMTLLM EfficiencyArcVQ-VAE: A Spherical Vector Quantization Framework with ArcCosine Additive Marginhttp://arxiv.org/abs/2605.13517v1http://arxiv.org/abs/2605.13517v1Jaeyung Kim et al. — arxiv:2605.13517 — LLM EfficiencyWed, 13 May 2026 00:00:00 GMTLLM EfficiencyTurboGR: An Accelerated Training System for Large-Scale Generative Recommendationhttp://arxiv.org/abs/2605.13433v1http://arxiv.org/abs/2605.13433v1Huichao Chai et al. — arxiv:2605.13433 — LLM EfficiencyWed, 13 May 2026 00:00:00 GMTLLM EfficiencyVector-Quantized Discrete Latent Factors Meet Financial Priors: Dynamic Cross-Sectional Stock Ranking Prediction for Portfolio Constructionhttp://arxiv.org/abs/2605.13407v1http://arxiv.org/abs/2605.13407v1Namhyoung Kim et al. — arxiv:2605.13407 — LLM EfficiencyWed, 13 May 2026 00:00:00 GMTLLM EfficiencyRotVLA: Rotational Latent Action for Vision-Language-Action Modelhttp://arxiv.org/abs/2605.13403v1http://arxiv.org/abs/2605.13403v1Qiwei Li et al. — arxiv:2605.13403 — LLM EfficiencyWed, 13 May 2026 00:00:00 GMTLLM EfficiencyPRISM-X: Experiments on Personalised Fine-Tuning with Human and Simulated Usershttp://arxiv.org/abs/2605.13307v1http://arxiv.org/abs/2605.13307v1Hannah Rose Kirk et al. — arxiv:2605.13307 — AlignmentWed, 13 May 2026 00:00:00 GMTAlignmentNot Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancyhttp://arxiv.org/abs/2605.12991v1http://arxiv.org/abs/2605.12991v1Adarsh Kumarappan et al. — arxiv:2605.12991 — AlignmentWed, 13 May 2026 00:00:00 GMTAlignmentEVA-Bench: A New End-to-end Framework for Evaluating Voice Agentshttp://arxiv.org/abs/2605.13841v1http://arxiv.org/abs/2605.13841v1Tara Bogavelli et al. — arxiv:2605.13841 — HallucinationWed, 13 May 2026 00:00:00 GMTHallucinationNegation Neglect: When models fail to learn negations in traininghttp://arxiv.org/abs/2605.13829v1http://arxiv.org/abs/2605.13829v1Harry Mayne et al. — arxiv:2605.13829 — HallucinationWed, 13 May 2026 00:00:00 GMTHallucinationRoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Datahttp://arxiv.org/abs/2605.13775v1http://arxiv.org/abs/2605.13775v1Harold Haodong Chen et al. — arxiv:2605.13775 — HallucinationWed, 13 May 2026 00:00:00 GMTHallucinationWhere Does Reasoning Break? Step-Level Hallucination Detection via Hidden-State Transport Geometryhttp://arxiv.org/abs/2605.13772v1http://arxiv.org/abs/2605.13772v1Tyler Alvarez et al. — arxiv:2605.13772 — HallucinationWed, 13 May 2026 00:00:00 GMTHallucinationSceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Modelshttp://arxiv.org/abs/2605.13667v1http://arxiv.org/abs/2605.13667v1Vladislav Makarov et al. — arxiv:2605.13667 — HallucinationWed, 13 May 2026 00:00:00 GMTHallucinationPersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agentshttp://arxiv.org/abs/2605.13481v1http://arxiv.org/abs/2605.13481v1Mikhail Menschikov et al. — arxiv:2605.13481 — HallucinationWed, 13 May 2026 00:00:00 GMTHallucinationFIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languageshttp://arxiv.org/abs/2605.13330v1http://arxiv.org/abs/2605.13330v1Sarmistha Das et al. — arxiv:2605.13330 — HallucinationWed, 13 May 2026 00:00:00 GMTHallucinationTwisted Alexander vanishing groups of knotshttp://arxiv.org/abs/2605.13291v1http://arxiv.org/abs/2605.13291v1Katsumi Ishikawa et al. — arxiv:2605.13291 — HallucinationWed, 13 May 2026 00:00:00 GMTHallucinationChem-GMNet: A Sphere-Native Geometric Transformer for Molecular Property Predictionhttp://arxiv.org/abs/2605.13262v1http://arxiv.org/abs/2605.13262v1Deepak Warrier et al. — arxiv:2605.13262 — HallucinationWed, 13 May 2026 00:00:00 GMTHallucinationGeoBuildBench: A Benchmark for Interactive and Executable Geometry Construction from Natural Languagehttp://arxiv.org/abs/2605.13167v1http://arxiv.org/abs/2605.13167v1Jinwoong Kim et al. — arxiv:2605.13167 — HallucinationWed, 13 May 2026 00:00:00 GMTHallucinationQuantitative Linear Logic for Neuro-Symbolic Learning and Verificationhttp://arxiv.org/abs/2605.13845v2http://arxiv.org/abs/2605.13845v2Thomas Flinkow et al. — arxiv:2605.13845 — LLM SafetyWed, 13 May 2026 00:00:00 GMTLLM SafetyModel-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolutionhttp://arxiv.org/abs/2605.13411v1http://arxiv.org/abs/2605.13411v1Xiaozhe Zhang et al. — arxiv:2605.13411 — LLM SafetyWed, 13 May 2026 00:00:00 GMTLLM SafetyBackbone is All You Need: Assessing Vulnerabilities of Frozen Foundation Models in Synthetic Image Forensicshttp://arxiv.org/abs/2605.13381v1http://arxiv.org/abs/2605.13381v1Chiara Musso et al. — arxiv:2605.13381 — LLM SafetyWed, 13 May 2026 00:00:00 GMTLLM SafetyHierarchical Attacks for Multi-Modal Multi-Agent Reasoninghttp://arxiv.org/abs/2605.13213v1http://arxiv.org/abs/2605.13213v1Hao Zhou et al. — arxiv:2605.13213 — LLM SafetyWed, 13 May 2026 00:00:00 GMTLLM SafetyFinding the Weakest Link: Adversarial Attack against Multi-Agent Communicationshttp://arxiv.org/abs/2605.13170v1http://arxiv.org/abs/2605.13170v1Maxwell Standen et al. — arxiv:2605.13170 — LLM SafetyWed, 13 May 2026 00:00:00 GMTLLM SafetyAdaptive Steering and Remasking for Safe Generation in Diffusion Language Modelshttp://arxiv.org/abs/2605.13043v1http://arxiv.org/abs/2605.13043v1Yejin Lee et al. — arxiv:2605.13043 — LLM SafetyWed, 13 May 2026 00:00:00 GMTLLM SafetyQuantifying LLM Safety Degradation Under Repeated Attacks Using Survival Analysishttp://arxiv.org/abs/2605.12869v1http://arxiv.org/abs/2605.12869v1Zvi Topol et al. — arxiv:2605.12869 — LLM SafetyWed, 13 May 2026 00:00:00 GMTLLM SafetyAgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skillshttp://arxiv.org/abs/2605.13940v1http://arxiv.org/abs/2605.13940v1Haomin Zhuang et al. — arxiv:2605.13940 — LLM SafetyWed, 13 May 2026 00:00:00 GMTLLM SafetyRTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuninghttp://arxiv.org/abs/2605.13695v1http://arxiv.org/abs/2605.13695v1Andrea Morandi et al. — arxiv:2605.13695 — LLM EvaluationWed, 13 May 2026 00:00:00 GMTLLM EvaluationCreativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translationshttp://arxiv.org/abs/2605.13596v1http://arxiv.org/abs/2605.13596v1Kyo Gerrits et al. — arxiv:2605.13596 — LLM EvaluationWed, 13 May 2026 00:00:00 GMTLLM EvaluationDiscovery of Hidden Miscalibration Regimeshttp://arxiv.org/abs/2605.13484v1http://arxiv.org/abs/2605.13484v1Katarzyna Kobalczyk et al. — arxiv:2605.13484 — LLM EvaluationWed, 13 May 2026 00:00:00 GMTLLM EvaluationPersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agentshttp://arxiv.org/abs/2605.13481v1http://arxiv.org/abs/2605.13481v1Mikhail Menschikov et al. — arxiv:2605.13481 — LLM EvaluationWed, 13 May 2026 00:00:00 GMTLLM EvaluationFrom Rosetta to Match-Up: A Paired Corpus of Linguistic Puzzles with Human and LLM Benchmarkshttp://arxiv.org/abs/2605.13408v1http://arxiv.org/abs/2605.13408v1Neh Majmudar et al. — arxiv:2605.13408 — LLM EvaluationWed, 13 May 2026 00:00:00 GMTLLM EvaluationLLM-Based Persuasion Enables Guardrail Override in Frontier LLMshttp://arxiv.org/abs/2605.13334v1http://arxiv.org/abs/2605.13334v1Rodrigo Nogueira et al. — arxiv:2605.13334 — LLM EvaluationWed, 13 May 2026 00:00:00 GMTLLM EvaluationVERA-MH: Validation of Ethical and Responsible AI in Mental Healthhttp://arxiv.org/abs/2605.13318v1http://arxiv.org/abs/2605.13318v1Luca Belli et al. — arxiv:2605.13318 — LLM EvaluationWed, 13 May 2026 00:00:00 GMTLLM EvaluationSWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cyclehttp://arxiv.org/abs/2605.13139v1http://arxiv.org/abs/2605.13139v1Hao Guan et al. — arxiv:2605.13139 — LLM EvaluationWed, 13 May 2026 00:00:00 GMTLLM EvaluationMultimodal Hidden Markov Models for Persistent Emotional State Trackinghttp://arxiv.org/abs/2605.12838v1http://arxiv.org/abs/2605.12838v1Anamika Ragu et al. — arxiv:2605.12838 — LLM EvaluationWed, 13 May 2026 00:00:00 GMTLLM EvaluationHLS-Seek: QoR-Aware Code Generation for High-Level Synthesis via Proxy Comparative Reward Reinforcement Learninghttp://arxiv.org/abs/2605.13536v1http://arxiv.org/abs/2605.13536v1Qingyun Zou et al. — arxiv:2605.13536 — Code LLMWed, 13 May 2026 00:00:00 GMTCode LLMTRIAGE: Evaluating Prospective Metacognitive Control in LLMs under Resource Constraintshttp://arxiv.org/abs/2605.13414v1http://arxiv.org/abs/2605.13414v1Zabir Al Nazi et al. — arxiv:2605.13414 — Code LLMWed, 13 May 2026 00:00:00 GMTCode LLMAI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agentshttp://arxiv.org/abs/2605.13357v1http://arxiv.org/abs/2605.13357v1Hailin Zhong et al. — arxiv:2605.13357 — Code LLMWed, 13 May 2026 00:00:00 GMTCode LLMThe Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Codehttp://arxiv.org/abs/2605.13280v1http://arxiv.org/abs/2605.13280v1Hengzhi Ye et al. — arxiv:2605.13280 — Code LLMWed, 13 May 2026 00:00:00 GMTCode LLMUIBenchKit: A unified toolkit for design-to-code model evaluationhttp://arxiv.org/abs/2605.13141v1http://arxiv.org/abs/2605.13141v1Chinh T. Le et al. — arxiv:2605.13141 — Code LLMWed, 13 May 2026 00:00:00 GMTCode LLMProtocol-Driven Development: Governing Generated Software Through Invariants and Evidencehttp://arxiv.org/abs/2605.12981v1http://arxiv.org/abs/2605.12981v1Jun He et al. — arxiv:2605.12981 — Code LLMWed, 13 May 2026 00:00:00 GMTCode LLMRetrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.12975v1http://arxiv.org/abs/2605.12975v1Jiashuo Sun et al. — arxiv:2605.12975 — Code LLMWed, 13 May 2026 00:00:00 GMTCode LLMChipMATE: Multi-Agent Training via Reinforcement Learning for Enhanced RTL Generationhttp://arxiv.org/abs/2605.12857v1http://arxiv.org/abs/2605.12857v1Zhongkai Yu et al. — arxiv:2605.12857 — Code LLMWed, 13 May 2026 00:00:00 GMTCode LLMThe KnotMosaics Package for SageMathhttp://arxiv.org/abs/2605.14189v1http://arxiv.org/abs/2605.14189v1Mary Y. Deng et al. — arxiv:2605.14189 — Code LLMWed, 13 May 2026 00:00:00 GMTCode LLMLLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metricshttp://arxiv.org/abs/2605.13412v1http://arxiv.org/abs/2605.13412v1Galadrielle Humblot-Renaux et al. — arxiv:2605.13412 — Legal NLPWed, 13 May 2026 00:00:00 GMTLegal NLPGranite Embedding Multilingual R2 Modelshttp://arxiv.org/abs/2605.13521v1http://arxiv.org/abs/2605.13521v1Parul Awasthy et al. — arxiv:2605.13521 — Multilingual NLPWed, 13 May 2026 00:00:00 GMTMultilingual NLPImproving Code Translation with Syntax-Guided and Semantic-aware Preference Optimizationhttp://arxiv.org/abs/2605.13229v1http://arxiv.org/abs/2605.13229v1Yuhan Wu et al. — arxiv:2605.13229 — Multilingual NLPWed, 13 May 2026 00:00:00 GMTMultilingual NLPVividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognitionhttp://arxiv.org/abs/2605.13087v1http://arxiv.org/abs/2605.13087v1Kush Juvekar et al. — arxiv:2605.13087 — Multilingual NLPWed, 13 May 2026 00:00:00 GMTMultilingual NLPDiM\textsuperscript{3}: Bridging Multilingual and Multimodal Models via Direction- and Magnitude-Aware Merginghttp://arxiv.org/abs/2605.12960v1http://arxiv.org/abs/2605.12960v1Zijing Wang et al. — arxiv:2605.12960 — Multilingual NLPWed, 13 May 2026 00:00:00 GMTMultilingual NLPLocale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Modelshttp://arxiv.org/abs/2605.13538v1http://arxiv.org/abs/2605.13538v1Anuj Sadani et al. — arxiv:2605.13538 — Named Entity RecognitionWed, 13 May 2026 00:00:00 GMTNamed Entity RecognitionMultimodal Graph-based Classification of Esophageal Motility Disordershttp://arxiv.org/abs/2605.13623v1http://arxiv.org/abs/2605.13623v1Alexander Geiger et al. — arxiv:2605.13623 — Information ExtractionWed, 13 May 2026 00:00:00 GMTInformation ExtractionLLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metricshttp://arxiv.org/abs/2605.13412v1http://arxiv.org/abs/2605.13412v1Galadrielle Humblot-Renaux et al. — arxiv:2605.13412 — Text ClassificationWed, 13 May 2026 00:00:00 GMTText ClassificationNeurosymbolic Auditing of Natural-Language Software Requirementshttp://arxiv.org/abs/2605.13817v1http://arxiv.org/abs/2605.13817v1Bethel Hall et al. — arxiv:2605.13817 — Question AnsweringWed, 13 May 2026 00:00:00 GMTQuestion AnsweringScaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merginghttp://arxiv.org/abs/2605.13534v1http://arxiv.org/abs/2605.13534v1Jiabei Liu et al. — arxiv:2605.13534 — Question AnsweringWed, 13 May 2026 00:00:00 GMTQuestion AnsweringFIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languageshttp://arxiv.org/abs/2605.13330v1http://arxiv.org/abs/2605.13330v1Sarmistha Das et al. — arxiv:2605.13330 — Question AnsweringWed, 13 May 2026 00:00:00 GMTQuestion AnsweringCANTANTE: Optimizing Agentic Systems via Contrastive Credit Attributionhttp://arxiv.org/abs/2605.13295v1http://arxiv.org/abs/2605.13295v1Tom Zehle et al. — arxiv:2605.13295 — Question AnsweringWed, 13 May 2026 00:00:00 GMTQuestion AnsweringIndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languageshttp://arxiv.org/abs/2605.13292v1http://arxiv.org/abs/2605.13292v1Shubham Kumar Nigam et al. — arxiv:2605.13292 — Question AnsweringWed, 13 May 2026 00:00:00 GMTQuestion AnsweringReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Groundinghttp://arxiv.org/abs/2605.13228v1http://arxiv.org/abs/2605.13228v1Xiao Liu et al. — arxiv:2605.13228 — Question AnsweringWed, 13 May 2026 00:00:00 GMTQuestion AnsweringSkill-Aligned Annotation for Reliable Evaluation in Text-to-Image Generationhttp://arxiv.org/abs/2605.13223v1http://arxiv.org/abs/2605.13223v1Abdelrahman Eldesokey et al. — arxiv:2605.13223 — Question AnsweringWed, 13 May 2026 00:00:00 GMTQuestion AnsweringAcquisitionSynthesis: Targeted Data Generation using Acquisition Functionshttp://arxiv.org/abs/2605.13149v1http://arxiv.org/abs/2605.13149v1Ishika Agarwal et al. — arxiv:2605.13149 — Question AnsweringWed, 13 May 2026 00:00:00 GMTQuestion AnsweringF-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Rankinghttp://arxiv.org/abs/2605.12995v1http://arxiv.org/abs/2605.12995v1Rohan Surana et al. — arxiv:2605.12995 — Question AnsweringWed, 13 May 2026 00:00:00 GMTQuestion AnsweringRetrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.12975v1http://arxiv.org/abs/2605.12975v1Jiashuo Sun et al. — arxiv:2605.12975 — Question AnsweringWed, 13 May 2026 00:00:00 GMTQuestion AnsweringPersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agentshttp://arxiv.org/abs/2605.13481v1http://arxiv.org/abs/2605.13481v1Mikhail Menschikov et al. — arxiv:2605.13481 — Knowledge GraphWed, 13 May 2026 00:00:00 GMTKnowledge GraphIdeaForge: A Knowledge Graph-Grounded Multi-Agent Framework for Cross-Methodology Innovation Analysis and Patent Claim Generationhttp://arxiv.org/abs/2605.13311v1http://arxiv.org/abs/2605.13311v1Joy Bose et al. — arxiv:2605.13311 — Knowledge GraphWed, 13 May 2026 00:00:00 GMTKnowledge GraphSemRepo: A Knowledge Graph for Research Software and Its Scholarly Ecosystemhttp://arxiv.org/abs/2605.13310v1http://arxiv.org/abs/2605.13310v1Abdul Rafay et al. — arxiv:2605.13310 — Knowledge GraphWed, 13 May 2026 00:00:00 GMTKnowledge GraphStrikingness-Aware Evaluation for Temporal Knowledge Graph Reasoninghttp://arxiv.org/abs/2605.13153v1http://arxiv.org/abs/2605.13153v1Rikui Huang et al. — arxiv:2605.13153 — Knowledge GraphWed, 13 May 2026 00:00:00 GMTKnowledge GraphCommonWhy: A Dataset for Evaluating Entity-Based Causal Commonsense Reasoning in Large Language Modelshttp://arxiv.org/abs/2605.12918v1http://arxiv.org/abs/2605.12918v1Armin Toroghi et al. — arxiv:2605.12918 — Knowledge GraphWed, 13 May 2026 00:00:00 GMTKnowledge GraphMind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMshttp://arxiv.org/abs/2605.12242v1http://arxiv.org/abs/2605.12242v1Deepak Kumar et al. — arxiv:2605.12242 — NLPTue, 12 May 2026 00:00:00 GMTNLPMechanistic Interpretability of ASR models using Sparse Autoencodershttp://arxiv.org/abs/2605.12225v1http://arxiv.org/abs/2605.12225v1Dan Pluth et al. — arxiv:2605.12225 — NLPTue, 12 May 2026 00:00:00 GMTNLPA microservices-based endpoint monitoring platform with predictive NLP models for real-time security and hate-speech risk alertinghttp://arxiv.org/abs/2605.11997v2http://arxiv.org/abs/2605.11997v2Darlan Noetzold et al. — arxiv:2605.11997 — NLPTue, 12 May 2026 00:00:00 GMTNLPRethinking Positional Encoding for Neural Vehicle Routinghttp://arxiv.org/abs/2605.11910v1http://arxiv.org/abs/2605.11910v1Chuanbo Hua et al. — arxiv:2605.11910 — NLPTue, 12 May 2026 00:00:00 GMTNLPGAR: Carbon-Aware Routing for LLM Inference via Constrained Optimizationhttp://arxiv.org/abs/2605.11603v1http://arxiv.org/abs/2605.11603v1Disha Sheshanarayana et al. — arxiv:2605.11603 — NLPTue, 12 May 2026 00:00:00 GMTNLPString Diagrams for Quantum Foundations, Computing and Natural Language Processinghttp://arxiv.org/abs/2605.11417v1http://arxiv.org/abs/2605.11417v1Muhammad Hamza Waseem et al. — arxiv:2605.11417 — NLPTue, 12 May 2026 00:00:00 GMTNLPCovering Human Action Space for Computer Use: Data Synthesis and Benchmarkhttp://arxiv.org/abs/2605.12501v1http://arxiv.org/abs/2605.12501v1Miaosen Zhang et al. — arxiv:2605.12501 — LLMTue, 12 May 2026 00:00:00 GMTLLMAlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Rewardhttp://arxiv.org/abs/2605.12495v1http://arxiv.org/abs/2605.12495v1Runhui Huang et al. — arxiv:2605.12495 — LLMTue, 12 May 2026 00:00:00 GMTLLMPion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformationhttp://arxiv.org/abs/2605.12492v1http://arxiv.org/abs/2605.12492v1Kexuan Shi et al. — arxiv:2605.12492 — LLMTue, 12 May 2026 00:00:00 GMTLLMTask-Adaptive Embedding Refinement via Test-time LLM Guidancehttp://arxiv.org/abs/2605.12487v1http://arxiv.org/abs/2605.12487v1Ariel Gera et al. — arxiv:2605.12487 — LLMTue, 12 May 2026 00:00:00 GMTLLMLearning, Fast and Slow: Towards LLMs That Adapt Continuallyhttp://arxiv.org/abs/2605.12484v1http://arxiv.org/abs/2605.12484v1Rishabh Tiwari et al. — arxiv:2605.12484 — LLMTue, 12 May 2026 00:00:00 GMTLLMMEME: Multi-entity & Evolving Memory Evaluationhttp://arxiv.org/abs/2605.12477v1http://arxiv.org/abs/2605.12477v1Seokwon Jung et al. — arxiv:2605.12477 — LLMTue, 12 May 2026 00:00:00 GMTLLMMulti-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputshttp://arxiv.org/abs/2605.12460v1http://arxiv.org/abs/2605.12460v1Guinan Su et al. — arxiv:2605.12460 — LLMTue, 12 May 2026 00:00:00 GMTLLMTextSeal: A Localized LLM Watermark for Provenance & Distillation Protectionhttp://arxiv.org/abs/2605.12456v1http://arxiv.org/abs/2605.12456v1Tom Sander et al. — arxiv:2605.12456 — LLMTue, 12 May 2026 00:00:00 GMTLLMThe Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Eventshttp://arxiv.org/abs/2605.12452v1http://arxiv.org/abs/2605.12452v1Gunjan et al. — arxiv:2605.12452 — LLMTue, 12 May 2026 00:00:00 GMTLLMLychSim: A Controllable and Interactive Simulation Framework for Vision Researchhttp://arxiv.org/abs/2605.12449v1http://arxiv.org/abs/2605.12449v1Wufei Ma et al. — arxiv:2605.12449 — LLMTue, 12 May 2026 00:00:00 GMTLLMCovering Human Action Space for Computer Use: Data Synthesis and Benchmarkhttp://arxiv.org/abs/2605.12501v1http://arxiv.org/abs/2605.12501v1Miaosen Zhang et al. — arxiv:2605.12501 — LLM AgentTue, 12 May 2026 00:00:00 GMTLLM AgentSenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecturehttp://arxiv.org/abs/2605.12500v1http://arxiv.org/abs/2605.12500v1Haiwen Diao et al. — arxiv:2605.12500 — LLM AgentTue, 12 May 2026 00:00:00 GMTLLM AgentFrom Web to Pixels: Bringing Agentic Search into Visual Perceptionhttp://arxiv.org/abs/2605.12497v1http://arxiv.org/abs/2605.12497v1Bokang Yang et al. — arxiv:2605.12497 — LLM AgentTue, 12 May 2026 00:00:00 GMTLLM AgentLongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagueshttp://arxiv.org/abs/2605.12493v1http://arxiv.org/abs/2605.12493v1Di Wu et al. — arxiv:2605.12493 — LLM AgentTue, 12 May 2026 00:00:00 GMTLLM AgentLetting the neural code speak: Automated characterization of monkey visual neurons through human languagehttp://arxiv.org/abs/2605.12485v1http://arxiv.org/abs/2605.12485v1Vedang Lad et al. — arxiv:2605.12485 — LLM AgentTue, 12 May 2026 00:00:00 GMTLLM AgentToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agentshttp://arxiv.org/abs/2605.12481v1http://arxiv.org/abs/2605.12481v1Xuhao Hu et al. — arxiv:2605.12481 — LLM AgentTue, 12 May 2026 00:00:00 GMTLLM AgentMEME: Multi-entity & Evolving Memory Evaluationhttp://arxiv.org/abs/2605.12477v1http://arxiv.org/abs/2605.12477v1Seokwon Jung et al. — arxiv:2605.12477 — LLM AgentTue, 12 May 2026 00:00:00 GMTLLM AgentKV-Fold: One-Step KV-Cache Recurrence for Long-Context Inferencehttp://arxiv.org/abs/2605.12471v1http://arxiv.org/abs/2605.12471v1Alireza Nadali et al. — arxiv:2605.12471 — LLM AgentTue, 12 May 2026 00:00:00 GMTLLM AgentMulti-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputshttp://arxiv.org/abs/2605.12460v1http://arxiv.org/abs/2605.12460v1Guinan Su et al. — arxiv:2605.12460 — LLM AgentTue, 12 May 2026 00:00:00 GMTLLM AgentLychSim: A Controllable and Interactive Simulation Framework for Vision Researchhttp://arxiv.org/abs/2605.12449v1http://arxiv.org/abs/2605.12449v1Wufei Ma et al. — arxiv:2605.12449 — LLM AgentTue, 12 May 2026 00:00:00 GMTLLM AgentFrom Web to Pixels: Bringing Agentic Search into Visual Perceptionhttp://arxiv.org/abs/2605.12497v1http://arxiv.org/abs/2605.12497v1Bokang Yang et al. — arxiv:2605.12497 — Multi-AgentTue, 12 May 2026 00:00:00 GMTMulti-AgentMEME: Multi-entity & Evolving Memory Evaluationhttp://arxiv.org/abs/2605.12477v1http://arxiv.org/abs/2605.12477v1Seokwon Jung et al. — arxiv:2605.12477 — Multi-AgentTue, 12 May 2026 00:00:00 GMTMulti-AgentKV-Fold: One-Step KV-Cache Recurrence for Long-Context Inferencehttp://arxiv.org/abs/2605.12471v1http://arxiv.org/abs/2605.12471v1Alireza Nadali et al. — arxiv:2605.12471 — Multi-AgentTue, 12 May 2026 00:00:00 GMTMulti-AgentEvents as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learninghttp://arxiv.org/abs/2605.12388v1http://arxiv.org/abs/2605.12388v1Hannes Büchi et al. — arxiv:2605.12388 — Multi-AgentTue, 12 May 2026 00:00:00 GMTMulti-AgentProfiliTable: Profiling-Driven Tabular Data Processing via Agentic Workflowshttp://arxiv.org/abs/2605.12376v1http://arxiv.org/abs/2605.12376v1Wei Liu et al. — arxiv:2605.12376 — Multi-AgentTue, 12 May 2026 00:00:00 GMTMulti-AgentLISA: Cognitive Arbitration for Signal-Free Autonomous Intersection Managementhttp://arxiv.org/abs/2605.12321v1http://arxiv.org/abs/2605.12321v1Abderrahmane Lakas et al. — arxiv:2605.12321 — Multi-AgentTue, 12 May 2026 00:00:00 GMTMulti-AgentExecutable Agentic Memory for GUI Agenthttp://arxiv.org/abs/2605.12294v1http://arxiv.org/abs/2605.12294v1Zerui Qin et al. — arxiv:2605.12294 — Multi-AgentTue, 12 May 2026 00:00:00 GMTMulti-AgentIterative Audit Convergence in LLM-Managed Multi-Agent Systems: A Case Study in Prompt Engineering Quality Assurancehttp://arxiv.org/abs/2605.12280v1http://arxiv.org/abs/2605.12280v1Elias Calboreanu et al. — arxiv:2605.12280 — Multi-AgentTue, 12 May 2026 00:00:00 GMTMulti-AgentNo Action Without a NOD: A Heterogeneous Multi-Agent Architecture for Reliable Service Agentshttp://arxiv.org/abs/2605.12240v1http://arxiv.org/abs/2605.12240v1Zixu Yang et al. — arxiv:2605.12240 — Multi-AgentTue, 12 May 2026 00:00:00 GMTMulti-AgentGoal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systemshttp://arxiv.org/abs/2605.12213v1http://arxiv.org/abs/2605.12213v1Jiazhou Liang et al. — arxiv:2605.12213 — Multi-AgentTue, 12 May 2026 00:00:00 GMTMulti-AgentLongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagueshttp://arxiv.org/abs/2605.12493v1http://arxiv.org/abs/2605.12493v1Di Wu et al. — arxiv:2605.12493 — RAGTue, 12 May 2026 00:00:00 GMTRAGOverview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answeringhttp://arxiv.org/abs/2605.12313v1http://arxiv.org/abs/2605.12313v1Rezarta Islamaj et al. — arxiv:2605.12313 — RAGTue, 12 May 2026 00:00:00 GMTRAGGoal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systemshttp://arxiv.org/abs/2605.12213v1http://arxiv.org/abs/2605.12213v1Jiazhou Liang et al. — arxiv:2605.12213 — RAGTue, 12 May 2026 00:00:00 GMTRAGSAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memoryhttp://arxiv.org/abs/2605.12061v1http://arxiv.org/abs/2605.12061v1Juntong Wang et al. — arxiv:2605.12061 — RAGTue, 12 May 2026 00:00:00 GMTRAGLegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Lettershttp://arxiv.org/abs/2605.12012v1http://arxiv.org/abs/2605.12012v1Virgill van der Meer et al. — arxiv:2605.12012 — RAGTue, 12 May 2026 00:00:00 GMTRAGTowards Order Fairness: Mitigating LLMs Order Sensitivity through Dual Group Advantage Optimizationhttp://arxiv.org/abs/2605.11974v1http://arxiv.org/abs/2605.11974v1Xu Chu et al. — arxiv:2605.11974 — RAGTue, 12 May 2026 00:00:00 GMTRAGVery Efficient Listwise Multimodal Reranking for Long Documentshttp://arxiv.org/abs/2605.11864v1http://arxiv.org/abs/2605.11864v1Yiqun Sun et al. — arxiv:2605.11864 — RAGTue, 12 May 2026 00:00:00 GMTRAGPersistent and Conversational Multi-Method Explainability for Trustworthy Financial AIhttp://arxiv.org/abs/2605.11687v1http://arxiv.org/abs/2605.11687v1Georgios Makridis et al. — arxiv:2605.11687 — RAGTue, 12 May 2026 00:00:00 GMTRAGCuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAGhttp://arxiv.org/abs/2605.11611v1http://arxiv.org/abs/2605.11611v1Jianghan Shen et al. — arxiv:2605.11611 — RAGTue, 12 May 2026 00:00:00 GMTRAGAgents Should Replace Narrow Predictive AI as the Orchestrator in 6G AI-RANhttp://arxiv.org/abs/2605.11516v1http://arxiv.org/abs/2605.11516v1Pranshav Gajjar et al. — arxiv:2605.11516 — RAGTue, 12 May 2026 00:00:00 GMTRAGMulti-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputshttp://arxiv.org/abs/2605.12460v1http://arxiv.org/abs/2605.12460v1Guinan Su et al. — arxiv:2605.12460 — ReasoningTue, 12 May 2026 00:00:00 GMTReasoningGeometric Factual Recall in Transformershttp://arxiv.org/abs/2605.12426v1http://arxiv.org/abs/2605.12426v1Shauli Ravfogel et al. — arxiv:2605.12426 — ReasoningTue, 12 May 2026 00:00:00 GMTReasoningOGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoninghttp://arxiv.org/abs/2605.12400v1http://arxiv.org/abs/2605.12400v1Yuxiao Yang et al. — arxiv:2605.12400 — ReasoningTue, 12 May 2026 00:00:00 GMTReasoningScalable Token-Level Hallucination Detection in Large Language Modelshttp://arxiv.org/abs/2605.12384v1http://arxiv.org/abs/2605.12384v1Rui Min et al. — arxiv:2605.12384 — ReasoningTue, 12 May 2026 00:00:00 GMTReasoningSelf-Consistent Latent Reasoning: Long Latent Sequence Reasoning for Vision-Language Modelhttp://arxiv.org/abs/2605.12163v1http://arxiv.org/abs/2605.12163v1Chenfeng Wang et al. — arxiv:2605.12163 — ReasoningTue, 12 May 2026 00:00:00 GMTReasoningTo Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demandshttp://arxiv.org/abs/2605.12120v1http://arxiv.org/abs/2605.12120v1Fangyi Yu et al. — arxiv:2605.12120 — ReasoningTue, 12 May 2026 00:00:00 GMTReasoningIntermediate Artifacts as First-Class Citizens: A Data Model for Durable Intermediate Artifacts in Agentic Systemshttp://arxiv.org/abs/2605.12087v1http://arxiv.org/abs/2605.12087v1Josh Rosen et al. — arxiv:2605.12087 — ReasoningTue, 12 May 2026 00:00:00 GMTReasoningOn the Limitations of Large Language Models for Conceptual Database Modelinghttp://arxiv.org/abs/2605.11986v1http://arxiv.org/abs/2605.11986v1Arthur F. Siqueira et al. — arxiv:2605.11986 — ReasoningTue, 12 May 2026 00:00:00 GMTReasoningFrom Noise to Diversity: Random Embedding Injection in LLM Reasoninghttp://arxiv.org/abs/2605.11936v1http://arxiv.org/abs/2605.11936v1Heejun Kim et al. — arxiv:2605.11936 — ReasoningTue, 12 May 2026 00:00:00 GMTReasoningProcedural-skill SFT across capacity tiers: A W-Shaped pre-SFT Trajectory and Regime-Asymmetric Mechanism on 0.8B-4B Qwen3.5 Modelshttp://arxiv.org/abs/2605.11907v1http://arxiv.org/abs/2605.11907v1Igor Strozzi et al. — arxiv:2605.11907 — ReasoningTue, 12 May 2026 00:00:00 GMTReasoningToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agentshttp://arxiv.org/abs/2605.12481v1http://arxiv.org/abs/2605.12481v1Xuhao Hu et al. — arxiv:2605.12481 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseRollout Cards: A Reproducibility Standard for Agent Researchhttp://arxiv.org/abs/2605.12131v1http://arxiv.org/abs/2605.12131v1Charlie Masters et al. — arxiv:2605.12131 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseProperty-Level Reconstructability of Agent Decisions: An Anchor-Level Pilot Across Vendor SDK Adapter Regimeshttp://arxiv.org/abs/2605.12078v1http://arxiv.org/abs/2605.12078v1Oleg Solozobov et al. — arxiv:2605.12078 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseThe SiMPL Method for Multi-Material Topology Optimizationhttp://arxiv.org/abs/2605.11994v1http://arxiv.org/abs/2605.11994v1Peter Gangl et al. — arxiv:2605.11994 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseWhen Simulation Lies: A Sim-to-Real Benchmark and Domain-Randomized RL Recipe for Tool-Use Agentshttp://arxiv.org/abs/2605.11928v1http://arxiv.org/abs/2605.11928v1Xiaolin Zhou et al. — arxiv:2605.11928 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseOn-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignmenthttp://arxiv.org/abs/2605.11882v1http://arxiv.org/abs/2605.11882v1Bo Yin et al. — arxiv:2605.11882 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseGEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillationhttp://arxiv.org/abs/2605.11853v1http://arxiv.org/abs/2605.11853v1Sijia Li et al. — arxiv:2605.11853 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseCan LLM Agents Respond to Disasters? Benchmarking Heterogeneous Geospatial Reasoning in Emergency Operationshttp://arxiv.org/abs/2605.11633v1http://arxiv.org/abs/2605.11633v1Junjue Wang et al. — arxiv:2605.11633 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseFrom Generic Correlation to Input-Specific Credit in On-Policy Self Distillationhttp://arxiv.org/abs/2605.11613v1http://arxiv.org/abs/2605.11613v1Guobin Shen et al. — arxiv:2605.11613 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseDecaf: Improving Neural Decompilation with Automatic Feedback and Searchhttp://arxiv.org/abs/2605.11501v1http://arxiv.org/abs/2605.11501v1Alexander Shypula et al. — arxiv:2605.11501 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseLarge Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safetyhttp://arxiv.org/abs/2605.12729v1http://arxiv.org/abs/2605.12729v1Muhammad Bilal et al. — arxiv:2605.12729 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseMulti-Rollout On-Policy Distillation via Peer Successes and Failureshttp://arxiv.org/abs/2605.12652v1http://arxiv.org/abs/2605.12652v1Weichen Yu et al. — arxiv:2605.12652 — Tool UseTue, 12 May 2026 00:00:00 GMTTool UseSenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecturehttp://arxiv.org/abs/2605.12500v1http://arxiv.org/abs/2605.12500v1Haiwen Diao et al. — arxiv:2605.12500 — Multimodal LLMTue, 12 May 2026 00:00:00 GMTMultimodal LLMAlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Rewardhttp://arxiv.org/abs/2605.12495v1http://arxiv.org/abs/2605.12495v1Runhui Huang et al. — arxiv:2605.12495 — Multimodal LLMTue, 12 May 2026 00:00:00 GMTMultimodal LLMBeyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Imageshttp://arxiv.org/abs/2605.12413v1http://arxiv.org/abs/2605.12413v1Yuangong Chen et al. — arxiv:2605.12413 — Multimodal LLMTue, 12 May 2026 00:00:00 GMTMultimodal LLMFill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Modelshttp://arxiv.org/abs/2605.12374v1http://arxiv.org/abs/2605.12374v1Yanting Miao et al. — arxiv:2605.12374 — Multimodal LLMTue, 12 May 2026 00:00:00 GMTMultimodal LLMGuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specializationhttp://arxiv.org/abs/2605.12369v1http://arxiv.org/abs/2605.12369v1Xiaosong Jia et al. — arxiv:2605.12369 — Multimodal LLMTue, 12 May 2026 00:00:00 GMTMultimodal LLMReinforcing VLAs in Task-Agnostic World Modelshttp://arxiv.org/abs/2605.12334v1http://arxiv.org/abs/2605.12334v1Yucen Wang et al. — arxiv:2605.12334 — Multimodal LLMTue, 12 May 2026 00:00:00 GMTMultimodal LLMTowards Automated Air Traffic Safety Assessment Around Non-Towered Airports Using Large Language Modelshttp://arxiv.org/abs/2605.12332v1http://arxiv.org/abs/2605.12332v1Torsten Darrell et al. — arxiv:2605.12332 — Multimodal LLMTue, 12 May 2026 00:00:00 GMTMultimodal LLMG$^2$TR: Generation-Guided Visual Token Reduction for Separate-Encoder Unified Multimodal Modelshttp://arxiv.org/abs/2605.12309v1http://arxiv.org/abs/2605.12309v1Junxian Li et al. — arxiv:2605.12309 — Multimodal LLMTue, 12 May 2026 00:00:00 GMTMultimodal LLMImages in Sentences: Scaling Interleaved Instructions for Unified Visual Generationhttp://arxiv.org/abs/2605.12305v1http://arxiv.org/abs/2605.12305v1Yabo Zhang et al. — arxiv:2605.12305 — Multimodal LLMTue, 12 May 2026 00:00:00 GMTMultimodal LLMLarge-Small Model Collaboration for Farmland Semantic Change Detectionhttp://arxiv.org/abs/2605.12282v1http://arxiv.org/abs/2605.12282v1Xinjia Li et al. — arxiv:2605.12282 — Multimodal LLMTue, 12 May 2026 00:00:00 GMTMultimodal LLMCausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narrativeshttp://arxiv.org/abs/2605.12496v1http://arxiv.org/abs/2605.12496v1Yihao Meng et al. — arxiv:2605.12496 — Long ContextTue, 12 May 2026 00:00:00 GMTLong ContextLongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagueshttp://arxiv.org/abs/2605.12493v1http://arxiv.org/abs/2605.12493v1Di Wu et al. — arxiv:2605.12493 — Long ContextTue, 12 May 2026 00:00:00 GMTLong ContextKV-Fold: One-Step KV-Cache Recurrence for Long-Context Inferencehttp://arxiv.org/abs/2605.12471v1http://arxiv.org/abs/2605.12471v1Alireza Nadali et al. — arxiv:2605.12471 — Long ContextTue, 12 May 2026 00:00:00 GMTLong ContextClassifier Context Rot: Monitor Performance Degrades with Context Lengthhttp://arxiv.org/abs/2605.12366v1http://arxiv.org/abs/2605.12366v1Sam Martin et al. — arxiv:2605.12366 — Long ContextTue, 12 May 2026 00:00:00 GMTLong Context$δ$-mem: Efficient Online Memory for Large Language Modelshttp://arxiv.org/abs/2605.12357v1http://arxiv.org/abs/2605.12357v1Jingdi Lei et al. — arxiv:2605.12357 — Long ContextTue, 12 May 2026 00:00:00 GMTLong ContextEHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Recordshttp://arxiv.org/abs/2605.12335v1http://arxiv.org/abs/2605.12335v1Saeed Shurrab et al. — arxiv:2605.12335 — Long ContextTue, 12 May 2026 00:00:00 GMTLong ContextPRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agentshttp://arxiv.org/abs/2605.12260v1http://arxiv.org/abs/2605.12260v1Jingyi Peng et al. — arxiv:2605.12260 — Long ContextTue, 12 May 2026 00:00:00 GMTLong ContextNo Action Without a NOD: A Heterogeneous Multi-Agent Architecture for Reliable Service Agentshttp://arxiv.org/abs/2605.12240v1http://arxiv.org/abs/2605.12240v1Zixu Yang et al. — arxiv:2605.12240 — Long ContextTue, 12 May 2026 00:00:00 GMTLong ContextCombining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Modelshttp://arxiv.org/abs/2605.12227v1http://arxiv.org/abs/2605.12227v1Miguel Moura Ramos et al. — arxiv:2605.12227 — Long ContextTue, 12 May 2026 00:00:00 GMTLong ContextGoal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systemshttp://arxiv.org/abs/2605.12213v1http://arxiv.org/abs/2605.12213v1Jiazhou Liang et al. — arxiv:2605.12213 — Long ContextTue, 12 May 2026 00:00:00 GMTLong ContextSearch Your Block Floating Point Scales!http://arxiv.org/abs/2605.12464v1http://arxiv.org/abs/2605.12464v1Tanmaey Gupta et al. — arxiv:2605.12464 — LLM EfficiencyTue, 12 May 2026 00:00:00 GMTLLM EfficiencyEquivariant Space Group and Hamiltonian for Collinear Magnetic Systemshttp://arxiv.org/abs/2605.12440v1http://arxiv.org/abs/2605.12440v1Chaoxi Cui et al. — arxiv:2605.12440 — LLM EfficiencyTue, 12 May 2026 00:00:00 GMTLLM EfficiencyNCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Codinghttp://arxiv.org/abs/2605.12396v1http://arxiv.org/abs/2605.12396v1Jiamin Wang et al. — arxiv:2605.12396 — LLM EfficiencyTue, 12 May 2026 00:00:00 GMTLLM EfficiencyEvents as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learninghttp://arxiv.org/abs/2605.12388v1http://arxiv.org/abs/2605.12388v1Hannes Büchi et al. — arxiv:2605.12388 — LLM EfficiencyTue, 12 May 2026 00:00:00 GMTLLM EfficiencyOutput Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generationhttp://arxiv.org/abs/2605.12345v1http://arxiv.org/abs/2605.12345v1Michela Lorandi et al. — arxiv:2605.12345 — LLM EfficiencyTue, 12 May 2026 00:00:00 GMTLLM EfficiencyGrid Games: The Power of Multiple Grids for Quantizing Large Language Modelshttp://arxiv.org/abs/2605.12327v1http://arxiv.org/abs/2605.12327v1Vage Egiazarian et al. — arxiv:2605.12327 — LLM EfficiencyTue, 12 May 2026 00:00:00 GMTLLM EfficiencySOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantizationhttp://arxiv.org/abs/2605.12245v1http://arxiv.org/abs/2605.12245v1Chengzhu Bao et al. — arxiv:2605.12245 — LLM EfficiencyTue, 12 May 2026 00:00:00 GMTLLM EfficiencyNeural Network-Based Virtual Wheel-Speed Sensor for Enhanced Low-Velocity State Estimationhttp://arxiv.org/abs/2605.12230v1http://arxiv.org/abs/2605.12230v1Hendrik Schäfke et al. — arxiv:2605.12230 — LLM EfficiencyTue, 12 May 2026 00:00:00 GMTLLM EfficiencyNot How Many, But Which: Parameter Placement in Low-Rank Adaptationhttp://arxiv.org/abs/2605.12207v1http://arxiv.org/abs/2605.12207v1Arijit Sehanobish et al. — arxiv:2605.12207 — LLM EfficiencyTue, 12 May 2026 00:00:00 GMTLLM EfficiencyCross-Modal-Domain Generalization Through Semantically Aligned Discrete Representationshttp://arxiv.org/abs/2605.12145v1http://arxiv.org/abs/2605.12145v1Souptik Sen et al. — arxiv:2605.12145 — LLM EfficiencyTue, 12 May 2026 00:00:00 GMTLLM EfficiencySemantic Reward Collapse and the Preservation of Epistemic Integrity in Adaptive AI Systemshttp://arxiv.org/abs/2605.12406v1http://arxiv.org/abs/2605.12406v1William Parris et al. — arxiv:2605.12406 — AlignmentTue, 12 May 2026 00:00:00 GMTAlignmentTokenRatio: Principled Token-Level Preference Optimization via Ratio Matchinghttp://arxiv.org/abs/2605.12288v2http://arxiv.org/abs/2605.12288v2Truong Nguyen et al. — arxiv:2605.12288 — AlignmentTue, 12 May 2026 00:00:00 GMTAlignmentSyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learninghttp://arxiv.org/abs/2605.12179v1http://arxiv.org/abs/2605.12179v1Xin Cheng et al. — arxiv:2605.12179 — AlignmentTue, 12 May 2026 00:00:00 GMTAlignmentWhen Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropyhttp://arxiv.org/abs/2605.12112v1http://arxiv.org/abs/2605.12112v1Xiaofeng Tan et al. — arxiv:2605.12112 — AlignmentTue, 12 May 2026 00:00:00 GMTAlignmentLearn to Think: Improving Multimodal Reasoning through Vision-Aware Self-Improvement Traininghttp://arxiv.org/abs/2605.11931v1http://arxiv.org/abs/2605.11931v1Qihuang Zhong et al. — arxiv:2605.11931 — AlignmentTue, 12 May 2026 00:00:00 GMTAlignmentYFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoninghttp://arxiv.org/abs/2605.11906v1http://arxiv.org/abs/2605.11906v1Yifan Le et al. — arxiv:2605.11906 — AlignmentTue, 12 May 2026 00:00:00 GMTAlignmentVariance-aware Reward Modeling with Anchor Guidancehttp://arxiv.org/abs/2605.11865v1http://arxiv.org/abs/2605.11865v1Shuxing Fang et al. — arxiv:2605.11865 — AlignmentTue, 12 May 2026 00:00:00 GMTAlignmentEnhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimizationhttp://arxiv.org/abs/2605.11632v1http://arxiv.org/abs/2605.11632v1Yilong Wang et al. — arxiv:2605.11632 — AlignmentTue, 12 May 2026 00:00:00 GMTAlignmentDocAtlas: Multilingual Document Understanding Across 80+ Languageshttp://arxiv.org/abs/2605.12623v1http://arxiv.org/abs/2605.12623v1Ahmed Heakl et al. — arxiv:2605.12623 — AlignmentTue, 12 May 2026 00:00:00 GMTAlignmentLetting the neural code speak: Automated characterization of monkey visual neurons through human languagehttp://arxiv.org/abs/2605.12485v1http://arxiv.org/abs/2605.12485v1Vedang Lad et al. — arxiv:2605.12485 — HallucinationTue, 12 May 2026 00:00:00 GMTHallucinationReward Hacking in Rubric-Based Reinforcement Learninghttp://arxiv.org/abs/2605.12474v1http://arxiv.org/abs/2605.12474v1Anas Mahmoud et al. — arxiv:2605.12474 — HallucinationTue, 12 May 2026 00:00:00 GMTHallucinationCAAFC: Chronological Actionable Automated Fact-Checker for misinformation / non-factual hallucination detection and correctionhttp://arxiv.org/abs/2605.12436v1http://arxiv.org/abs/2605.12436v1Islam Eldifrawi et al. — arxiv:2605.12436 — HallucinationTue, 12 May 2026 00:00:00 GMTHallucinationGeometric Factual Recall in Transformershttp://arxiv.org/abs/2605.12426v1http://arxiv.org/abs/2605.12426v1Shauli Ravfogel et al. — arxiv:2605.12426 — HallucinationTue, 12 May 2026 00:00:00 GMTHallucinationSemantic Reward Collapse and the Preservation of Epistemic Integrity in Adaptive AI Systemshttp://arxiv.org/abs/2605.12406v1http://arxiv.org/abs/2605.12406v1William Parris et al. — arxiv:2605.12406 — HallucinationTue, 12 May 2026 00:00:00 GMTHallucinationScalable Token-Level Hallucination Detection in Large Language Modelshttp://arxiv.org/abs/2605.12384v1http://arxiv.org/abs/2605.12384v1Rui Min et al. — arxiv:2605.12384 — HallucinationTue, 12 May 2026 00:00:00 GMTHallucinationReinforcing VLAs in Task-Agnostic World Modelshttp://arxiv.org/abs/2605.12334v1http://arxiv.org/abs/2605.12334v1Yucen Wang et al. — arxiv:2605.12334 — HallucinationTue, 12 May 2026 00:00:00 GMTHallucinationGKnow: Measuring the Entanglement of Gender Bias and Factual Genderhttp://arxiv.org/abs/2605.12299v1http://arxiv.org/abs/2605.12299v1Leonor Veloso et al. — arxiv:2605.12299 — HallucinationTue, 12 May 2026 00:00:00 GMTHallucinationInstruction Lens Score: Your Instruction Contributes a Powerful Object Hallucination Detector for Multimodal Large Language Modelshttp://arxiv.org/abs/2605.12258v1http://arxiv.org/abs/2605.12258v1Runhe Lai et al. — arxiv:2605.12258 — HallucinationTue, 12 May 2026 00:00:00 GMTHallucinationWhy Conclusions Diverge from the Same Observations: Formalizing World-Model Non-Identifiability via an Inferencehttp://arxiv.org/abs/2605.12255v1http://arxiv.org/abs/2605.12255v1Toru Takahashi et al. — arxiv:2605.12255 — HallucinationTue, 12 May 2026 00:00:00 GMTHallucinationTargeted Neuron Modulation via Contrastive Pair Searchhttp://arxiv.org/abs/2605.12290v1http://arxiv.org/abs/2605.12290v1Sam Herring et al. — arxiv:2605.12290 — LLM SafetyTue, 12 May 2026 00:00:00 GMTLLM SafetyMetaphor Is Not All Attention Needshttp://arxiv.org/abs/2605.12128v1http://arxiv.org/abs/2605.12128v1Olga Sorokoletova et al. — arxiv:2605.12128 — LLM SafetyTue, 12 May 2026 00:00:00 GMTLLM SafetyProteus: A Self-Evolving Red Team for Agent Skill Ecosystemshttp://arxiv.org/abs/2605.11891v1http://arxiv.org/abs/2605.11891v1Zhaojiacheng Zhou et al. — arxiv:2605.11891 — LLM SafetyTue, 12 May 2026 00:00:00 GMTLLM SafetyIPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injectionhttp://arxiv.org/abs/2605.11868v1http://arxiv.org/abs/2605.11868v1Chia-Pei et al. — arxiv:2605.11868 — LLM SafetyTue, 12 May 2026 00:00:00 GMTLLM SafetyPersona-Conditioned Adversarial Prompting: Multi-Identity Red-Teaming for Adversarial Discovery and Mitigationhttp://arxiv.org/abs/2605.11730v1http://arxiv.org/abs/2605.11730v1Cristian Morasso et al. — arxiv:2605.11730 — LLM SafetyTue, 12 May 2026 00:00:00 GMTLLM SafetySafeSteer: A Decoding-level Defense Mechanism for Multimodal Large Language Modelshttp://arxiv.org/abs/2605.11716v1http://arxiv.org/abs/2605.11716v1Xinyi Zeng et al. — arxiv:2605.11716 — LLM SafetyTue, 12 May 2026 00:00:00 GMTLLM SafetySafety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysishttp://arxiv.org/abs/2605.11664v1http://arxiv.org/abs/2605.11664v1Zhenhao Xu et al. — arxiv:2605.11664 — LLM SafetyTue, 12 May 2026 00:00:00 GMTLLM SafetyA Mimetic Detector for Adversarial Image Perturbationshttp://arxiv.org/abs/2605.11492v1http://arxiv.org/abs/2605.11492v1Johnny Corbino et al. — arxiv:2605.11492 — LLM SafetyTue, 12 May 2026 00:00:00 GMTLLM SafetyREALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinationshttp://arxiv.org/abs/2605.12813v1http://arxiv.org/abs/2605.12813v1Buyun Liang et al. — arxiv:2605.12813 — LLM SafetyTue, 12 May 2026 00:00:00 GMTLLM SafetyStill Camouflage, Moving Illusion: View-Induced Trajectory Manipulation in Autonomous Drivinghttp://arxiv.org/abs/2605.12743v1http://arxiv.org/abs/2605.12743v1Shuo Ju et al. — arxiv:2605.12743 — LLM SafetyTue, 12 May 2026 00:00:00 GMTLLM SafetyBefore the Last Token: Diagnosing Final-Token Safety Probe Failureshttp://arxiv.org/abs/2605.12726v1http://arxiv.org/abs/2605.12726v1Shravan Doda et al. — arxiv:2605.12726 — LLM SafetyTue, 12 May 2026 00:00:00 GMTLLM SafetyPredicting Disagreement with Human Raters in LLM-as-a-Judge Difficulty Assessment without Using Generation-Time Probability Signalshttp://arxiv.org/abs/2605.12422v1http://arxiv.org/abs/2605.12422v1Yo Ehara et al. — arxiv:2605.12422 — LLM EvaluationTue, 12 May 2026 00:00:00 GMTLLM EvaluationMedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answeringhttp://arxiv.org/abs/2605.12361v1http://arxiv.org/abs/2605.12361v1Rezarta Islamaj et al. — arxiv:2605.12361 — LLM EvaluationTue, 12 May 2026 00:00:00 GMTLLM EvaluationPRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agentshttp://arxiv.org/abs/2605.12260v1http://arxiv.org/abs/2605.12260v1Jingyi Peng et al. — arxiv:2605.12260 — LLM EvaluationTue, 12 May 2026 00:00:00 GMTLLM EvaluationProcedural-skill SFT across capacity tiers: A W-Shaped pre-SFT Trajectory and Regime-Asymmetric Mechanism on 0.8B-4B Qwen3.5 Modelshttp://arxiv.org/abs/2605.11907v1http://arxiv.org/abs/2605.11907v1Igor Strozzi et al. — arxiv:2605.11907 — LLM EvaluationTue, 12 May 2026 00:00:00 GMTLLM EvaluationAllegory of the Cave: Measurement-Grounded Vision-Language Learninghttp://arxiv.org/abs/2605.11727v1http://arxiv.org/abs/2605.11727v1Kepeng Xu et al. — arxiv:2605.11727 — LLM EvaluationTue, 12 May 2026 00:00:00 GMTLLM EvaluationHuman-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Abilityhttp://arxiv.org/abs/2605.11663v1http://arxiv.org/abs/2605.11663v1Kyosuke Takami et al. — arxiv:2605.11663 — LLM EvaluationTue, 12 May 2026 00:00:00 GMTLLM EvaluationRead, Grep, and Synthesize: Diagnosing Cross-Domain Seed Exposure for LLM Research Ideationhttp://arxiv.org/abs/2605.11532v1http://arxiv.org/abs/2605.11532v1Yunju Choi et al. — arxiv:2605.11532 — LLM EvaluationTue, 12 May 2026 00:00:00 GMTLLM EvaluationNeurodata Without Boredom: Benchmarking Agentic AI for Data Reusehttp://arxiv.org/abs/2605.12808v1http://arxiv.org/abs/2605.12808v1Ling-Qi Zhang et al. — arxiv:2605.12808 — LLM EvaluationTue, 12 May 2026 00:00:00 GMTLLM EvaluationScalable Packed Layouts for Vector-Length-Agnostic ML Code Generationhttp://arxiv.org/abs/2605.12445v1http://arxiv.org/abs/2605.12445v1Ege Beysel et al. — arxiv:2605.12445 — Code LLMTue, 12 May 2026 00:00:00 GMTCode LLMUncertainty Quantification for LLM-based Code Generationhttp://arxiv.org/abs/2605.12201v1http://arxiv.org/abs/2605.12201v1Senrong Xu et al. — arxiv:2605.12201 — Code LLMTue, 12 May 2026 00:00:00 GMTCode LLMRollout Cards: A Reproducibility Standard for Agent Researchhttp://arxiv.org/abs/2605.12131v1http://arxiv.org/abs/2605.12131v1Charlie Masters et al. — arxiv:2605.12131 — Code LLMTue, 12 May 2026 00:00:00 GMTCode LLMStepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learninghttp://arxiv.org/abs/2605.11922v1http://arxiv.org/abs/2605.11922v1Hao Wang et al. — arxiv:2605.11922 — Code LLMTue, 12 May 2026 00:00:00 GMTCode LLMAgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agentshttp://arxiv.org/abs/2605.11732v1http://arxiv.org/abs/2605.11732v1Jiarui Jin et al. — arxiv:2605.11732 — Code LLMTue, 12 May 2026 00:00:00 GMTCode LLMCoT-Guard: Small Models for Strong Monitoringhttp://arxiv.org/abs/2605.12746v1http://arxiv.org/abs/2605.12746v1Nirav Diwan et al. — arxiv:2605.12746 — Code LLMTue, 12 May 2026 00:00:00 GMTCode LLM3D Primitives are a Spatial Language for VLMshttp://arxiv.org/abs/2605.12586v1http://arxiv.org/abs/2605.12586v1Junze Liu et al. — arxiv:2605.12586 — Code LLMTue, 12 May 2026 00:00:00 GMTCode LLMTowards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Modelhttp://arxiv.org/abs/2605.12036v1http://arxiv.org/abs/2605.12036v1Guojian Li et al. — arxiv:2605.12036 — Speech LLMTue, 12 May 2026 00:00:00 GMTSpeech LLMPoly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modelinghttp://arxiv.org/abs/2605.12310v1http://arxiv.org/abs/2605.12310v1Chen Geng et al. — arxiv:2605.12310 — Multilingual NLPTue, 12 May 2026 00:00:00 GMTMultilingual NLPMechanistic Interpretability of ASR models using Sparse Autoencodershttp://arxiv.org/abs/2605.12225v1http://arxiv.org/abs/2605.12225v1Dan Pluth et al. — arxiv:2605.12225 — Multilingual NLPTue, 12 May 2026 00:00:00 GMTMultilingual NLPSign Language Recognition and Translation for Low-Resource Languages: Challenges and Pathways Forwardhttp://arxiv.org/abs/2605.12096v1http://arxiv.org/abs/2605.12096v1Nigar Alishzade et al. — arxiv:2605.12096 — Multilingual NLPTue, 12 May 2026 00:00:00 GMTMultilingual NLPEnhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimizationhttp://arxiv.org/abs/2605.11632v1http://arxiv.org/abs/2605.11632v1Yilong Wang et al. — arxiv:2605.11632 — Multilingual NLPTue, 12 May 2026 00:00:00 GMTMultilingual NLPScaling Laws for Mixture Pretraining Under Data Constraintshttp://arxiv.org/abs/2605.12715v1http://arxiv.org/abs/2605.12715v1Anastasiia Sedova et al. — arxiv:2605.12715 — Multilingual NLPTue, 12 May 2026 00:00:00 GMTMultilingual NLPDocAtlas: Multilingual Document Understanding Across 80+ Languageshttp://arxiv.org/abs/2605.12623v1http://arxiv.org/abs/2605.12623v1Ahmed Heakl et al. — arxiv:2605.12623 — Multilingual NLPTue, 12 May 2026 00:00:00 GMTMultilingual NLPConcordance Comparison as a Means of Assembling Local Grammarshttp://arxiv.org/abs/2605.11862v1http://arxiv.org/abs/2605.11862v1Juliana Pirovani et al. — arxiv:2605.11862 — Named Entity RecognitionTue, 12 May 2026 00:00:00 GMTNamed Entity RecognitionConcordance Comparison as a Means of Assembling Local Grammarshttp://arxiv.org/abs/2605.11862v1http://arxiv.org/abs/2605.11862v1Juliana Pirovani et al. — arxiv:2605.11862 — Information ExtractionTue, 12 May 2026 00:00:00 GMTInformation ExtractionPurification of a monitored qubit: exact path-integral solutionhttp://arxiv.org/abs/2605.12783v1http://arxiv.org/abs/2605.12783v1Matheus M. R. Poltronieri Martins et al. — arxiv:2605.12783 — Information ExtractionTue, 12 May 2026 00:00:00 GMTInformation ExtractionHow Useful Is Cross-Domain Generalization for Training LLM Monitors?http://arxiv.org/abs/2605.12265v1http://arxiv.org/abs/2605.12265v1Sam Martin et al. — arxiv:2605.12265 — Text ClassificationTue, 12 May 2026 00:00:00 GMTText ClassificationA microservices-based endpoint monitoring platform with predictive NLP models for real-time security and hate-speech risk alertinghttp://arxiv.org/abs/2605.11997v2http://arxiv.org/abs/2605.11997v2Darlan Noetzold et al. — arxiv:2605.11997 — Text ClassificationTue, 12 May 2026 00:00:00 GMTText ClassificationFrom Web to Pixels: Bringing Agentic Search into Visual Perceptionhttp://arxiv.org/abs/2605.12497v1http://arxiv.org/abs/2605.12497v1Bokang Yang et al. — arxiv:2605.12497 — Question AnsweringTue, 12 May 2026 00:00:00 GMTQuestion AnsweringLongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagueshttp://arxiv.org/abs/2605.12493v1http://arxiv.org/abs/2605.12493v1Di Wu et al. — arxiv:2605.12493 — Question AnsweringTue, 12 May 2026 00:00:00 GMTQuestion AnsweringORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Modelshttp://arxiv.org/abs/2605.12446v1http://arxiv.org/abs/2605.12446v1Chen Li et al. — arxiv:2605.12446 — Question AnsweringTue, 12 May 2026 00:00:00 GMTQuestion AnsweringExtending QuAK with Nested Quantitative Automatahttp://arxiv.org/abs/2605.12418v1http://arxiv.org/abs/2605.12418v1Thomas A. Henzinger et al. — arxiv:2605.12418 — Question AnsweringTue, 12 May 2026 00:00:00 GMTQuestion AnsweringBeyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Imageshttp://arxiv.org/abs/2605.12413v1http://arxiv.org/abs/2605.12413v1Yuangong Chen et al. — arxiv:2605.12413 — Question AnsweringTue, 12 May 2026 00:00:00 GMTQuestion AnsweringQuestion Difficulty Estimation for Large Language Models via Answer Plausibility Scoringhttp://arxiv.org/abs/2605.12398v1http://arxiv.org/abs/2605.12398v1Jamshid Mozafari et al. — arxiv:2605.12398 — Question AnsweringTue, 12 May 2026 00:00:00 GMTQuestion AnsweringContext Convergence Improves Answering Inferential Questionshttp://arxiv.org/abs/2605.12370v1http://arxiv.org/abs/2605.12370v1Jamshid Mozafari et al. — arxiv:2605.12370 — Question AnsweringTue, 12 May 2026 00:00:00 GMTQuestion AnsweringMedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answeringhttp://arxiv.org/abs/2605.12361v1http://arxiv.org/abs/2605.12361v1Rezarta Islamaj et al. — arxiv:2605.12361 — Question AnsweringTue, 12 May 2026 00:00:00 GMTQuestion AnsweringOverview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answeringhttp://arxiv.org/abs/2605.12313v1http://arxiv.org/abs/2605.12313v1Rezarta Islamaj et al. — arxiv:2605.12313 — Question AnsweringTue, 12 May 2026 00:00:00 GMTQuestion AnsweringMitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decodinghttp://arxiv.org/abs/2605.12185v1http://arxiv.org/abs/2605.12185v1Yigeng Zhou et al. — arxiv:2605.12185 — Question AnsweringTue, 12 May 2026 00:00:00 GMTQuestion AnsweringPersistent and Conversational Multi-Method Explainability for Trustworthy Financial AIhttp://arxiv.org/abs/2605.11687v1http://arxiv.org/abs/2605.11687v1Georgios Makridis et al. — arxiv:2605.11687 — Sentiment AnalysisTue, 12 May 2026 00:00:00 GMTSentiment AnalysisExecutable Agentic Memory for GUI Agenthttp://arxiv.org/abs/2605.12294v1http://arxiv.org/abs/2605.12294v1Zerui Qin et al. — arxiv:2605.12294 — Knowledge GraphTue, 12 May 2026 00:00:00 GMTKnowledge GraphGraph-Grounded Optimization: Rao-Family Metaheuristics, Classical OR, and SLM-Driven Formulation over Knowledge Graphshttp://arxiv.org/abs/2605.12204v2http://arxiv.org/abs/2605.12204v2Madhulatha Mandarapu et al. — arxiv:2605.12204 — Knowledge GraphTue, 12 May 2026 00:00:00 GMTKnowledge GraphBadSKP: Backdoor Attacks on Knowledge Graph-Enhanced LLMs with Soft Promptshttp://arxiv.org/abs/2605.11996v1http://arxiv.org/abs/2605.11996v1Xiaoting Lyu et al. — arxiv:2605.11996 — Knowledge GraphTue, 12 May 2026 00:00:00 GMTKnowledge GraphQwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understandinghttp://arxiv.org/abs/2605.10296v1http://arxiv.org/abs/2605.10296v1Anton Bazdyrev et al. — arxiv:2605.10296 — NLPMon, 11 May 2026 00:00:00 GMTNLPBuilding Korean linguistic resource for NLU data generation of banking app CS dialog systemhttp://arxiv.org/abs/2605.10241v1http://arxiv.org/abs/2605.10241v1Jeongwoo Yoon et al. — arxiv:2605.10241 — NLPMon, 11 May 2026 00:00:00 GMTNLPGLiNER-Relex: A Unified Framework for Joint Named Entity Recognition and Relation Extractionhttp://arxiv.org/abs/2605.10108v1http://arxiv.org/abs/2605.10108v1Ihor Stepanov et al. — arxiv:2605.10108 — NLPMon, 11 May 2026 00:00:00 GMTNLPH-MAPS: Hierarchical Memory-Augmented Proactive Search Assistant for Scientific Literaturehttp://arxiv.org/abs/2605.10097v1http://arxiv.org/abs/2605.10097v1Koji Nishikawa et al. — arxiv:2605.10097 — NLPMon, 11 May 2026 00:00:00 GMTNLPBeyond Majority Voting: Agreement-Based Clustering to Model Annotator Perspectives in Subjective NLP Taskshttp://arxiv.org/abs/2605.09955v1http://arxiv.org/abs/2605.09955v1Tadesse Destaw Belay et al. — arxiv:2605.09955 — NLPMon, 11 May 2026 00:00:00 GMTNLPYield Curve Forecasting using Machine Learning and Econometrics: A Comparative Analysishttp://arxiv.org/abs/2605.09842v1http://arxiv.org/abs/2605.09842v1Aman Singh et al. — arxiv:2605.09842 — NLPMon, 11 May 2026 00:00:00 GMTNLPBeyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenizationhttp://arxiv.org/abs/2605.10780v2http://arxiv.org/abs/2605.10780v2Xuanyu Zhu et al. — arxiv:2605.10780 — NLPMon, 11 May 2026 00:00:00 GMTNLPWhy Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgishhttp://arxiv.org/abs/2605.10714v1http://arxiv.org/abs/2605.10714v1Fred Philippy et al. — arxiv:2605.10714 — NLPMon, 11 May 2026 00:00:00 GMTNLPMeasuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritingshttp://arxiv.org/abs/2605.10606v1http://arxiv.org/abs/2605.10606v1Benjamin Icard et al. — arxiv:2605.10606 — NLPMon, 11 May 2026 00:00:00 GMTNLPThreatCore: A Benchmark for Explicit and Implicit Threat Detectionhttp://arxiv.org/abs/2605.10563v1http://arxiv.org/abs/2605.10563v1Davide Bruni et al. — arxiv:2605.10563 — NLPMon, 11 May 2026 00:00:00 GMTNLPICT-NLP at SemEval-2026 Task 3: Less Is More -- Multilingual Encoder with Joint Training and Adaptive Ensemble for Dimensional Aspect Sentiment Regressionhttp://arxiv.org/abs/2605.10560v1http://arxiv.org/abs/2605.10560v1Liyuan Huang et al. — arxiv:2605.10560 — NLPMon, 11 May 2026 00:00:00 GMTNLPHEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Modelhttp://arxiv.org/abs/2605.11255v1http://arxiv.org/abs/2605.11255v1Noam Kayzer et al. — arxiv:2605.11255 — NLPMon, 11 May 2026 00:00:00 GMTNLPThe Scaling Law of Evaluation Failure: Why Simple Averaging Collapses Under Data Sparsity and Item Difficulty Gaps, and How Item Response Theory Recovers Ground Truth Across Domainshttp://arxiv.org/abs/2605.11205v1http://arxiv.org/abs/2605.11205v1Jung Min Kang et al. — arxiv:2605.11205 — NLPMon, 11 May 2026 00:00:00 GMTNLPClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IVhttp://arxiv.org/abs/2605.11143v1http://arxiv.org/abs/2605.11143v1Alex Stinard et al. — arxiv:2605.11143 — NLPMon, 11 May 2026 00:00:00 GMTNLPExtending Confidence-Based Text2Cypher with Grammar and Schema Aware Filteringhttp://arxiv.org/abs/2605.10318v1http://arxiv.org/abs/2605.10318v1Makbule Gulcin Ozsoy et al. — arxiv:2605.10318 — LLMMon, 11 May 2026 00:00:00 GMTLLMPositive Alignment: Artificial Intelligence for Human Flourishinghttp://arxiv.org/abs/2605.10310v1http://arxiv.org/abs/2605.10310v1Ruben Laukkonen et al. — arxiv:2605.10310 — LLMMon, 11 May 2026 00:00:00 GMTLLMAgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Taskshttp://arxiv.org/abs/2605.10286v1http://arxiv.org/abs/2605.10286v1Baraa Al Jorf et al. — arxiv:2605.10286 — LLMMon, 11 May 2026 00:00:00 GMTLLMDP-LAC: Lightweight Adaptive Clipping for Differentially Private Federated Fine-tuning of Language Modelshttp://arxiv.org/abs/2605.10272v1http://arxiv.org/abs/2605.10272v1Haaris Mehmood et al. — arxiv:2605.10272 — LLMMon, 11 May 2026 00:00:00 GMTLLMIndustryBench: Probing the Industrial Knowledge Boundaries of LLMshttp://arxiv.org/abs/2605.10267v1http://arxiv.org/abs/2605.10267v1Songlin Bai et al. — arxiv:2605.10267 — LLMMon, 11 May 2026 00:00:00 GMTLLMKnowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.10253v1http://arxiv.org/abs/2605.10253v1Peiru Yang et al. — arxiv:2605.10253 — LLMMon, 11 May 2026 00:00:00 GMTLLMTeaching LLMs to See Graphs: Unifying Text and Structural Reasoninghttp://arxiv.org/abs/2605.10247v1http://arxiv.org/abs/2605.10247v1Dario Vajda et al. — arxiv:2605.10247 — LLMMon, 11 May 2026 00:00:00 GMTLLMSciIntegrity-Bench: A Benchmark for Evaluating Academic Integrity in AI Scientist Systemshttp://arxiv.org/abs/2605.10246v1http://arxiv.org/abs/2605.10246v1Zonglin Yang et al. — arxiv:2605.10246 — LLMMon, 11 May 2026 00:00:00 GMTLLMRoute Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selectionhttp://arxiv.org/abs/2605.10235v1http://arxiv.org/abs/2605.10235v1Yiwen Chen et al. — arxiv:2605.10235 — LLMMon, 11 May 2026 00:00:00 GMTLLMSocial Policy of Large Language Models: How GPT, Claude, DeepSeek and Grok Allocate Social Budgets in Spain and Germanyhttp://arxiv.org/abs/2605.10234v1http://arxiv.org/abs/2605.10234v1Claudia Benavides Cantos et al. — arxiv:2605.10234 — LLMMon, 11 May 2026 00:00:00 GMTLLMEvaluating the False Trust engendered by LLM Explanationshttp://arxiv.org/abs/2605.10930v1http://arxiv.org/abs/2605.10930v1Vardhan Palod et al. — arxiv:2605.10930 — LLMMon, 11 May 2026 00:00:00 GMTLLMDynamic Skill Lifecycle Management for Agentic Reinforcement Learninghttp://arxiv.org/abs/2605.10923v1http://arxiv.org/abs/2605.10923v1Junhao Shen et al. — arxiv:2605.10923 — LLMMon, 11 May 2026 00:00:00 GMTLLMWildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluationhttp://arxiv.org/abs/2605.10912v1http://arxiv.org/abs/2605.10912v1Shuangrui Ding et al. — arxiv:2605.10912 — LLMMon, 11 May 2026 00:00:00 GMTLLMBeyond Red-Teaming: Formal Guarantees of LLM Guardrail Classifiershttp://arxiv.org/abs/2605.10901v1http://arxiv.org/abs/2605.10901v1Nikita Kezins et al. — arxiv:2605.10901 — LLMMon, 11 May 2026 00:00:00 GMTLLMV4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Predictionhttp://arxiv.org/abs/2605.10896v1http://arxiv.org/abs/2605.10896v1Marcin Kostrzewa et al. — arxiv:2605.10896 — LLMMon, 11 May 2026 00:00:00 GMTLLMCppPerf: An Automated Pipeline and Dataset for Performance-Improving C++ Commitshttp://arxiv.org/abs/2605.10890v1http://arxiv.org/abs/2605.10890v1Tommy Ho et al. — arxiv:2605.10890 — LLMMon, 11 May 2026 00:00:00 GMTLLMCount Anything at Any Granularityhttp://arxiv.org/abs/2605.10887v1http://arxiv.org/abs/2605.10887v1Chang Liu et al. — arxiv:2605.10887 — LLMMon, 11 May 2026 00:00:00 GMTLLMLoKA: Low-precision Kernel Applications for Recommendation Models At Scalehttp://arxiv.org/abs/2605.10886v1http://arxiv.org/abs/2605.10886v1Liang Luo et al. — arxiv:2605.10886 — LLMMon, 11 May 2026 00:00:00 GMTLLMAssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agentshttp://arxiv.org/abs/2605.10876v1http://arxiv.org/abs/2605.10876v1Edward De Brouwer et al. — arxiv:2605.10876 — LLMMon, 11 May 2026 00:00:00 GMTLLMCompute Where it Counts: Self Optimizing Language Modelshttp://arxiv.org/abs/2605.10875v1http://arxiv.org/abs/2605.10875v1Yash Akhauri et al. — arxiv:2605.10875 — LLMMon, 11 May 2026 00:00:00 GMTLLMPositive Alignment: Artificial Intelligence for Human Flourishinghttp://arxiv.org/abs/2605.10310v1http://arxiv.org/abs/2605.10310v1Ruben Laukkonen et al. — arxiv:2605.10310 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentAgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Taskshttp://arxiv.org/abs/2605.10286v1http://arxiv.org/abs/2605.10286v1Baraa Al Jorf et al. — arxiv:2605.10286 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentMemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereadinghttp://arxiv.org/abs/2605.10268v1http://arxiv.org/abs/2605.10268v1Baibei Ji et al. — arxiv:2605.10268 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentTowards Autonomous Railway Operations: A Semi-Hierarchical Deep Reinforcement Learning Approach to the Vehicle Rescheduling Problemhttp://arxiv.org/abs/2605.10257v1http://arxiv.org/abs/2605.10257v1Alberto Castagna et al. — arxiv:2605.10257 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentBeyond Autonomy: A Dynamic Tiered AgentRunner Framework for Governable and Resilient Enterprise AI Executionhttp://arxiv.org/abs/2605.10223v1http://arxiv.org/abs/2605.10223v1Kai Pan et al. — arxiv:2605.10223 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentV-ABS: Action-Observer Driven Beam Search for Dynamic Visual Reasoninghttp://arxiv.org/abs/2605.10172v1http://arxiv.org/abs/2605.10172v1Zhiwei Ning et al. — arxiv:2605.10172 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentWhen Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviewshttp://arxiv.org/abs/2605.10171v1http://arxiv.org/abs/2605.10171v1Sandeep Kumar et al. — arxiv:2605.10171 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentBalancing Efficiency and Fairness in Traffic Light Control through Deep Reinforcement Learninghttp://arxiv.org/abs/2605.10170v1http://arxiv.org/abs/2605.10170v1Matteo Cederle et al. — arxiv:2605.10170 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentNyayaAI: An AI-Powered Legal Assistant Using Multi-Agent Architecture and Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.10155v1http://arxiv.org/abs/2605.10155v1Deepanshu et al. — arxiv:2605.10155 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentIs DRL-based MAC Ready for Underwater Acoustic Networks? Exploring Its Practicality in Real Field Experimentshttp://arxiv.org/abs/2605.10144v1http://arxiv.org/abs/2605.10144v1Jiani Guo et al. — arxiv:2605.10144 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentPersonal Visual Context Learning in Large Multimodal Modelshttp://arxiv.org/abs/2605.10936v1http://arxiv.org/abs/2605.10936v1Zihui Xue et al. — arxiv:2605.10936 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentDynamic Skill Lifecycle Management for Agentic Reinforcement Learninghttp://arxiv.org/abs/2605.10923v1http://arxiv.org/abs/2605.10923v1Junhao Shen et al. — arxiv:2605.10923 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentOptimal and Scalable MAPF via Multi-Marginal Optimal Transport and Schrödinger Bridgeshttp://arxiv.org/abs/2605.10917v1http://arxiv.org/abs/2605.10917v1Usman A. Khan et al. — arxiv:2605.10917 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentShepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Tracehttp://arxiv.org/abs/2605.10913v1http://arxiv.org/abs/2605.10913v1Simon Yu et al. — arxiv:2605.10913 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentWildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluationhttp://arxiv.org/abs/2605.10912v1http://arxiv.org/abs/2605.10912v1Shuangrui Ding et al. — arxiv:2605.10912 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentEquivariant Reinforcement Learning for Clifford Quantum Circuit Synthesishttp://arxiv.org/abs/2605.10910v1http://arxiv.org/abs/2605.10910v1Richie Yeung et al. — arxiv:2605.10910 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentRevisiting Policy Gradients for Restricted Policy Classes: Escaping Myopic Local Optima with $k$-step Policy Gradientshttp://arxiv.org/abs/2605.10909v1http://arxiv.org/abs/2605.10909v1Alex DeWeese et al. — arxiv:2605.10909 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentEngineering Robustness into Personal Agents with the AI Workflow Storehttp://arxiv.org/abs/2605.10907v1http://arxiv.org/abs/2605.10907v1Roxana Geambasu et al. — arxiv:2605.10907 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentDataMaster: Towards Autonomous Data Engineering for Machine Learninghttp://arxiv.org/abs/2605.10906v1http://arxiv.org/abs/2605.10906v1Yaxin Du et al. — arxiv:2605.10906 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentMDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systemshttp://arxiv.org/abs/2605.10904v1http://arxiv.org/abs/2605.10904v1Marco Coscoy et al. — arxiv:2605.10904 — LLM AgentMon, 11 May 2026 00:00:00 GMTLLM AgentAgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Taskshttp://arxiv.org/abs/2605.10286v1http://arxiv.org/abs/2605.10286v1Baraa Al Jorf et al. — arxiv:2605.10286 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentTowards Autonomous Railway Operations: A Semi-Hierarchical Deep Reinforcement Learning Approach to the Vehicle Rescheduling Problemhttp://arxiv.org/abs/2605.10257v1http://arxiv.org/abs/2605.10257v1Alberto Castagna et al. — arxiv:2605.10257 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentBeyond Autonomy: A Dynamic Tiered AgentRunner Framework for Governable and Resilient Enterprise AI Executionhttp://arxiv.org/abs/2605.10223v1http://arxiv.org/abs/2605.10223v1Kai Pan et al. — arxiv:2605.10223 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentV-ABS: Action-Observer Driven Beam Search for Dynamic Visual Reasoninghttp://arxiv.org/abs/2605.10172v1http://arxiv.org/abs/2605.10172v1Zhiwei Ning et al. — arxiv:2605.10172 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentWhen Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviewshttp://arxiv.org/abs/2605.10171v1http://arxiv.org/abs/2605.10171v1Sandeep Kumar et al. — arxiv:2605.10171 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentNyayaAI: An AI-Powered Legal Assistant Using Multi-Agent Architecture and Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.10155v1http://arxiv.org/abs/2605.10155v1Deepanshu et al. — arxiv:2605.10155 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentSkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Executionhttp://arxiv.org/abs/2605.10114v1http://arxiv.org/abs/2605.10114v1Xiangcheng Meng et al. — arxiv:2605.10114 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Modelshttp://arxiv.org/abs/2605.10106v1http://arxiv.org/abs/2605.10106v1Tingshu Mou et al. — arxiv:2605.10106 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentRFAmpDesigner: A Self-Evolving Multi-Agent LLM Framework for Automated Radio Frequency Amplifier Designhttp://arxiv.org/abs/2605.10093v1http://arxiv.org/abs/2605.10093v1Hang Lu et al. — arxiv:2605.10093 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentAgentic Fuzzing: Opportunities and Challengeshttp://arxiv.org/abs/2605.10074v1http://arxiv.org/abs/2605.10074v1Junyoung Park et al. — arxiv:2605.10074 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentOptimal and Scalable MAPF via Multi-Marginal Optimal Transport and Schrödinger Bridgeshttp://arxiv.org/abs/2605.10917v1http://arxiv.org/abs/2605.10917v1Usman A. Khan et al. — arxiv:2605.10917 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentRevisiting Policy Gradients for Restricted Policy Classes: Escaping Myopic Local Optima with $k$-step Policy Gradientshttp://arxiv.org/abs/2605.10909v1http://arxiv.org/abs/2605.10909v1Alex DeWeese et al. — arxiv:2605.10909 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentMDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systemshttp://arxiv.org/abs/2605.10904v1http://arxiv.org/abs/2605.10904v1Marco Coscoy et al. — arxiv:2605.10904 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentNanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automationhttp://arxiv.org/abs/2605.10813v1http://arxiv.org/abs/2605.10813v1Jinhang Xu et al. — arxiv:2605.10813 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentLLMs for Secure Hardware Design and Related Problems: Opportunities and Challengeshttp://arxiv.org/abs/2605.10807v1http://arxiv.org/abs/2605.10807v1Johann Knechtel et al. — arxiv:2605.10807 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentLITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environmentshttp://arxiv.org/abs/2605.10779v1http://arxiv.org/abs/2605.10779v1Chiyu Zhang et al. — arxiv:2605.10779 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentMAGS-SLAM: Monocular Multi-Agent Gaussian Splatting SLAM for Geometrically and Photometrically Consistent Reconstructionhttp://arxiv.org/abs/2605.10760v1http://arxiv.org/abs/2605.10760v1Zhihao Cao et al. — arxiv:2605.10760 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentDecentralized Contingency MPC based on Safe Sets for Nonlinear Multi-agent Collision Avoidancehttp://arxiv.org/abs/2605.10738v1http://arxiv.org/abs/2605.10738v1Max Studt et al. — arxiv:2605.10738 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentHeteroscedastic Diffusion for Multi-Agent Trajectory Modelinghttp://arxiv.org/abs/2605.10717v1http://arxiv.org/abs/2605.10717v1Guillem Capellera et al. — arxiv:2605.10717 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentThe Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactionshttp://arxiv.org/abs/2605.10698v1http://arxiv.org/abs/2605.10698v1Dahlia Shehata et al. — arxiv:2605.10698 — Multi-AgentMon, 11 May 2026 00:00:00 GMTMulti-AgentQwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understandinghttp://arxiv.org/abs/2605.10296v1http://arxiv.org/abs/2605.10296v1Anton Bazdyrev et al. — arxiv:2605.10296 — RAGMon, 11 May 2026 00:00:00 GMTRAGKnowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.10253v1http://arxiv.org/abs/2605.10253v1Peiru Yang et al. — arxiv:2605.10253 — RAGMon, 11 May 2026 00:00:00 GMTRAGRoute Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selectionhttp://arxiv.org/abs/2605.10235v1http://arxiv.org/abs/2605.10235v1Yiwen Chen et al. — arxiv:2605.10235 — RAGMon, 11 May 2026 00:00:00 GMTRAGASTRA-QA: A Benchmark for Abstract Question Answering over Documentshttp://arxiv.org/abs/2605.10168v1http://arxiv.org/abs/2605.10168v1Shu Wang et al. — arxiv:2605.10168 — RAGMon, 11 May 2026 00:00:00 GMTRAGNyayaAI: An AI-Powered Legal Assistant Using Multi-Agent Architecture and Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.10155v1http://arxiv.org/abs/2605.10155v1Deepanshu et al. — arxiv:2605.10155 — RAGMon, 11 May 2026 00:00:00 GMTRAGRFAmpDesigner: A Self-Evolving Multi-Agent LLM Framework for Automated Radio Frequency Amplifier Designhttp://arxiv.org/abs/2605.10093v1http://arxiv.org/abs/2605.10093v1Hang Lu et al. — arxiv:2605.10093 — RAGMon, 11 May 2026 00:00:00 GMTRAGMerlin: Deterministic Byte-Exact Deduplication for Lossless Context Optimization in Large Language Model Inferencehttp://arxiv.org/abs/2605.09990v1http://arxiv.org/abs/2605.09990v1Sietse Schelpe et al. — arxiv:2605.09990 — RAGMon, 11 May 2026 00:00:00 GMTRAGFederated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coveragehttp://arxiv.org/abs/2605.09986v1http://arxiv.org/abs/2605.09986v1Prasanjit Dubey et al. — arxiv:2605.09986 — RAGMon, 11 May 2026 00:00:00 GMTRAGGrounded Satirical Generation with RAGhttp://arxiv.org/abs/2605.10853v1http://arxiv.org/abs/2605.10853v1Oona Itkonen et al. — arxiv:2605.10853 — RAGMon, 11 May 2026 00:00:00 GMTRAGThe First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoninghttp://arxiv.org/abs/2605.10828v1http://arxiv.org/abs/2605.10828v1Muhan Gao et al. — arxiv:2605.10828 — RAGMon, 11 May 2026 00:00:00 GMTRAGPathISE: Learning Informative Path Supervision for Knowledge Graph Question Answeringhttp://arxiv.org/abs/2605.10791v1http://arxiv.org/abs/2605.10791v1Shengxiang Gao et al. — arxiv:2605.10791 — RAGMon, 11 May 2026 00:00:00 GMTRAGComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandboxhttp://arxiv.org/abs/2605.10787v1http://arxiv.org/abs/2605.10787v1Yuanyang Li et al. — arxiv:2605.10787 — RAGMon, 11 May 2026 00:00:00 GMTRAGPrimeKG-CL: A Continual Graph Learning Benchmark on Evolving Biomedical Knowledge Graphshttp://arxiv.org/abs/2605.10529v1http://arxiv.org/abs/2605.10529v1Yousef A. Radwan et al. — arxiv:2605.10529 — RAGMon, 11 May 2026 00:00:00 GMTRAGTo Redact, or not to Redact? A Local LLM Approach to Deliberative Process Privilege Classificationhttp://arxiv.org/abs/2605.10211v1http://arxiv.org/abs/2605.10211v1Maik Larooij et al. — arxiv:2605.10211 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningLASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendationhttp://arxiv.org/abs/2605.10207v1http://arxiv.org/abs/2605.10207v1Yiwen Chen et al. — arxiv:2605.10207 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningBreaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Explorationhttp://arxiv.org/abs/2605.10195v1http://arxiv.org/abs/2605.10195v1Shuzhang Zhong et al. — arxiv:2605.10195 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningTeleResilienceBench: Quantifying Resilience for LLM Reasoning in Telecommunicationshttp://arxiv.org/abs/2605.09929v1http://arxiv.org/abs/2605.09929v1Pranshav Gajjar et al. — arxiv:2605.09929 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningSeparate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thoughthttp://arxiv.org/abs/2605.09906v1http://arxiv.org/abs/2605.09906v1Xuanchen Li et al. — arxiv:2605.09906 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningContinuous Latent Contexts Enable Efficient Online Learning in Transformershttp://arxiv.org/abs/2605.09867v1http://arxiv.org/abs/2605.09867v1Emile Anand et al. — arxiv:2605.09867 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningExploration-Driven Optimization for Test-Time Large Language Model Reasoninghttp://arxiv.org/abs/2605.09853v1http://arxiv.org/abs/2605.09853v1Changhao Li et al. — arxiv:2605.09853 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningEvaluating the False Trust engendered by LLM Explanationshttp://arxiv.org/abs/2605.10930v1http://arxiv.org/abs/2605.10930v1Vardhan Palod et al. — arxiv:2605.10930 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningUnmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Whyhttp://arxiv.org/abs/2605.10889v1http://arxiv.org/abs/2605.10889v1Mohammadreza Armandpour et al. — arxiv:2605.10889 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningThe Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studieshttp://arxiv.org/abs/2605.10799v1http://arxiv.org/abs/2605.10799v1Gabriel Garcia et al. — arxiv:2605.10799 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningCan You Keep a Secret? Involuntary Information Leakage in Language Model Writinghttp://arxiv.org/abs/2605.10794v1http://arxiv.org/abs/2605.10794v1Ari Holtzman et al. — arxiv:2605.10794 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningRadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiologyhttp://arxiv.org/abs/2605.10761v1http://arxiv.org/abs/2605.10761v1Wenxuan Li et al. — arxiv:2605.10761 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningC-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Drivinghttp://arxiv.org/abs/2605.10744v1http://arxiv.org/abs/2605.10744v1Kefei Tian et al. — arxiv:2605.10744 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningThe Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactionshttp://arxiv.org/abs/2605.10698v1http://arxiv.org/abs/2605.10698v1Dahlia Shehata et al. — arxiv:2605.10698 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningDeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learninghttp://arxiv.org/abs/2605.10488v1http://arxiv.org/abs/2605.10488v1Haoyu Huang et al. — arxiv:2605.10488 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningCoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Drivinghttp://arxiv.org/abs/2605.10426v1http://arxiv.org/abs/2605.10426v1Minqing Huang et al. — arxiv:2605.10426 — ReasoningMon, 11 May 2026 00:00:00 GMTReasoningV-ABS: Action-Observer Driven Beam Search for Dynamic Visual Reasoninghttp://arxiv.org/abs/2605.10172v1http://arxiv.org/abs/2605.10172v1Zhiwei Ning et al. — arxiv:2605.10172 — Tool UseMon, 11 May 2026 00:00:00 GMTTool UseTimeClaw: A Time-Series AI Agent with Exploratory Execution Learninghttp://arxiv.org/abs/2605.10038v1http://arxiv.org/abs/2605.10038v1Hangchen Liu et al. — arxiv:2605.10038 — Tool UseMon, 11 May 2026 00:00:00 GMTTool UseTRACER: Verifiable Generative Provenance for Multimodal Tool-Using Agentshttp://arxiv.org/abs/2605.09934v1http://arxiv.org/abs/2605.09934v1Bihui Yu et al. — arxiv:2605.09934 — Tool UseMon, 11 May 2026 00:00:00 GMTTool UseFocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuninghttp://arxiv.org/abs/2605.09932v1http://arxiv.org/abs/2605.09932v1Zehua Pei et al. — arxiv:2605.09932 — Tool UseMon, 11 May 2026 00:00:00 GMTTool UseThe Association of Transformer-based Sentiment Analysis with Symptom Distress and Deterioration in Routine Psychotherapy Carehttp://arxiv.org/abs/2605.09838v1http://arxiv.org/abs/2605.09838v1Douglas K. Faust et al. — arxiv:2605.09838 — Tool UseMon, 11 May 2026 00:00:00 GMTTool UseRethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?http://arxiv.org/abs/2605.10848v1http://arxiv.org/abs/2605.10848v1Tz-Huan Hsu et al. — arxiv:2605.10848 — Tool UseMon, 11 May 2026 00:00:00 GMTTool UseTowards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agentshttp://arxiv.org/abs/2605.10832v1http://arxiv.org/abs/2605.10832v1Shijue Huang et al. — arxiv:2605.10832 — Tool UseMon, 11 May 2026 00:00:00 GMTTool UseTrajPrism: A Multi-Task Benchmark for Language-Grounded Urban Trajectory Understandinghttp://arxiv.org/abs/2605.10782v1http://arxiv.org/abs/2605.10782v1Lihuan Li et al. — arxiv:2605.10782 — Tool UseMon, 11 May 2026 00:00:00 GMTTool UseAutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verificationhttp://arxiv.org/abs/2605.10712v1http://arxiv.org/abs/2605.10712v1Paschal C. Amusuo et al. — arxiv:2605.10712 — Tool UseMon, 11 May 2026 00:00:00 GMTTool UseSafe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systemshttp://arxiv.org/abs/2605.10481v1http://arxiv.org/abs/2605.10481v1Tianxiao Li et al. — arxiv:2605.10481 — Tool UseMon, 11 May 2026 00:00:00 GMTTool UseCoherency through formalisations of Structured Natural Language, A case study on FRETishhttp://arxiv.org/abs/2605.10462v1http://arxiv.org/abs/2605.10462v1Joost J. Joosten et al. — arxiv:2605.10462 — Tool UseMon, 11 May 2026 00:00:00 GMTTool UseFLARE: Full-Modality Long-Video Audiovisual Retrieval Benchmark with User-Simulated Querieshttp://arxiv.org/abs/2605.10228v1http://arxiv.org/abs/2605.10228v1Qijie You et al. — arxiv:2605.10228 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMSciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluationhttp://arxiv.org/abs/2605.10187v1http://arxiv.org/abs/2605.10187v1Longteng Guo et al. — arxiv:2605.10187 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMV-ABS: Action-Observer Driven Beam Search for Dynamic Visual Reasoninghttp://arxiv.org/abs/2605.10172v1http://arxiv.org/abs/2605.10172v1Zhiwei Ning et al. — arxiv:2605.10172 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMMicroWorld: Empowering Multimodal Large Language Models to Bridge the Microscopic Domain Gap with Multimodal Attribute Graphhttp://arxiv.org/abs/2605.10120v1http://arxiv.org/abs/2605.10120v1Manyu Li et al. — arxiv:2605.10120 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMPlan in Sandbox, Navigate in Open Worlds: Learning Physics-Grounded Abstracted Experience for Embodied Navigationhttp://arxiv.org/abs/2605.10118v1http://arxiv.org/abs/2605.10118v1Zhixuan Shen et al. — arxiv:2605.10118 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Modelshttp://arxiv.org/abs/2605.10106v1http://arxiv.org/abs/2605.10106v1Tingshu Mou et al. — arxiv:2605.10106 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMSocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generationhttp://arxiv.org/abs/2605.10079v1http://arxiv.org/abs/2605.10079v1Liangyang Ouyang et al. — arxiv:2605.10079 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMSketch-based Access Control: A Multimodal Interface for Translating User Preferences into Intent-Aligned Policieshttp://arxiv.org/abs/2605.10012v1http://arxiv.org/abs/2605.10012v1Kyzyl Monteiro et al. — arxiv:2605.10012 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMMed-StepBench: A Hierarchical Reasoning Framework for Evaluating Hallucinations in Medical Vision-Language Modelshttp://arxiv.org/abs/2605.10002v1http://arxiv.org/abs/2605.10002v1Minh Khoi Nguyen et al. — arxiv:2605.10002 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMERASE: Eliminating Redundant Visual Tokens via Adaptive Two-Stage Token Pruninghttp://arxiv.org/abs/2605.09982v1http://arxiv.org/abs/2605.09982v1Yuna Lee et al. — arxiv:2605.09982 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMPriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Modelshttp://arxiv.org/abs/2605.10925v1http://arxiv.org/abs/2605.10925v1Xinyu Guo et al. — arxiv:2605.10925 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMRoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmarkhttp://arxiv.org/abs/2605.10921v1http://arxiv.org/abs/2605.10921v1Huashuo Lei et al. — arxiv:2605.10921 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMWildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluationhttp://arxiv.org/abs/2605.10912v1http://arxiv.org/abs/2605.10912v1Shuangrui Ding et al. — arxiv:2605.10912 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMGrounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Rankinghttp://arxiv.org/abs/2605.10893v1http://arxiv.org/abs/2605.10893v1Reza Khanmohammadi et al. — arxiv:2605.10893 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMCount Anything at Any Granularityhttp://arxiv.org/abs/2605.10887v1http://arxiv.org/abs/2605.10887v1Chang Liu et al. — arxiv:2605.10887 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMCADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generationhttp://arxiv.org/abs/2605.10873v1http://arxiv.org/abs/2605.10873v1Anna C. Doris et al. — arxiv:2605.10873 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMBenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CADhttp://arxiv.org/abs/2605.10865v1http://arxiv.org/abs/2605.10865v1Haozhe Zhang et al. — arxiv:2605.10865 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMLearning More from Less: Exploiting Counterfactuals for Data-Efficient Chart Understandinghttp://arxiv.org/abs/2605.10855v1http://arxiv.org/abs/2605.10855v1Jianzhu Bao et al. — arxiv:2605.10855 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMVerification Mirage: Mapping the Reliability Boundary of Self-Verification in Medical VQAhttp://arxiv.org/abs/2605.10850v1http://arxiv.org/abs/2605.10850v1Ruinan Jin et al. — arxiv:2605.10850 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMBabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representationhttp://arxiv.org/abs/2605.10845v1http://arxiv.org/abs/2605.10845v1Qi Yang et al. — arxiv:2605.10845 — Multimodal LLMMon, 11 May 2026 00:00:00 GMTMultimodal LLMMemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereadinghttp://arxiv.org/abs/2605.10268v1http://arxiv.org/abs/2605.10268v1Baibei Ji et al. — arxiv:2605.10268 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextEfficient Hybrid CNN-GNN Architecture for Monocular Depth Estimationhttp://arxiv.org/abs/2605.10251v1http://arxiv.org/abs/2605.10251v1Ishan Narayan et al. — arxiv:2605.10251 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextRoute Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selectionhttp://arxiv.org/abs/2605.10235v1http://arxiv.org/abs/2605.10235v1Yiwen Chen et al. — arxiv:2605.10235 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextTRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignmenthttp://arxiv.org/abs/2605.10194v1http://arxiv.org/abs/2605.10194v1Jiaxuan Wang et al. — arxiv:2605.10194 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextCyclotron Line Variability and Accretion Dynamics in Vela X-1http://arxiv.org/abs/2605.10103v1http://arxiv.org/abs/2605.10103v1Mohammed Tobrej et al. — arxiv:2605.10103 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextBridging the Cognitive Gap: A Unified Memory Paradigm for 6G Agentic AI-RANhttp://arxiv.org/abs/2605.10036v1http://arxiv.org/abs/2605.10036v1Xijun Wang et al. — arxiv:2605.10036 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextContinual Harness: Online Adaptation for Self-Improving Foundation Agentshttp://arxiv.org/abs/2605.09998v1http://arxiv.org/abs/2605.09998v1Seth Karten et al. — arxiv:2605.09998 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextAttention Drift: What Autoregressive Speculative Decoding Models Learnhttp://arxiv.org/abs/2605.09992v1http://arxiv.org/abs/2605.09992v1Doğaç Eldenk et al. — arxiv:2605.09992 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextHiDrive: A Closed-Loop Benchmark for High-Level Autonomous Drivinghttp://arxiv.org/abs/2605.09972v1http://arxiv.org/abs/2605.09972v1Zhongyu Xia et al. — arxiv:2605.09972 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextFocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuninghttp://arxiv.org/abs/2605.09932v1http://arxiv.org/abs/2605.09932v1Zehua Pei et al. — arxiv:2605.09932 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextUnmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Whyhttp://arxiv.org/abs/2605.10889v1http://arxiv.org/abs/2605.10889v1Mohammadreza Armandpour et al. — arxiv:2605.10889 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextThe First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoninghttp://arxiv.org/abs/2605.10828v1http://arxiv.org/abs/2605.10828v1Muhan Gao et al. — arxiv:2605.10828 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextCLEF: EEG Foundation Model for Learning Clinical Semanticshttp://arxiv.org/abs/2605.10817v1http://arxiv.org/abs/2605.10817v1Peng Cao et al. — arxiv:2605.10817 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextWhere Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancinghttp://arxiv.org/abs/2605.10544v1http://arxiv.org/abs/2605.10544v1Jinchang Zhu et al. — arxiv:2605.10544 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextOpenSGA: Efficient 3D Scene Graph Alignment in the Open Worldhttp://arxiv.org/abs/2605.10484v1http://arxiv.org/abs/2605.10484v1Gang Chen et al. — arxiv:2605.10484 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextSafe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systemshttp://arxiv.org/abs/2605.10481v1http://arxiv.org/abs/2605.10481v1Tianxiao Li et al. — arxiv:2605.10481 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextSelf-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetitionhttp://arxiv.org/abs/2605.10466v1http://arxiv.org/abs/2605.10466v1Haoren Xu et al. — arxiv:2605.10466 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextToward an Engineering of Science: Rebalancing Generation and Verification in the Age of AIhttp://arxiv.org/abs/2605.10425v1http://arxiv.org/abs/2605.10425v1Jiaqi W. Ma et al. — arxiv:2605.10425 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextRemember to Forget: Gated Adaptive Positional Encodinghttp://arxiv.org/abs/2605.10414v1http://arxiv.org/abs/2605.10414v1Riccardo Ali et al. — arxiv:2605.10414 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextPhoenix-VL 1.5 Medium Technical Reporthttp://arxiv.org/abs/2605.10391v1http://arxiv.org/abs/2605.10391v1Team Phoenix et al. — arxiv:2605.10391 — Long ContextMon, 11 May 2026 00:00:00 GMTLong ContextLow-Cost GNSS Anti-Jamming Through 2-Bit Phase Shift Beamforming with Machine Learninghttp://arxiv.org/abs/2605.10264v1http://arxiv.org/abs/2605.10264v1Burak Soner et al. — arxiv:2605.10264 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyBulk-Edge Correspondence via Higher Gauge Theoryhttp://arxiv.org/abs/2605.10232v1http://arxiv.org/abs/2605.10232v1Hisham Sati et al. — arxiv:2605.10232 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyNano-U: Efficient Terrain Segmentation for Tiny Robot Navigationhttp://arxiv.org/abs/2605.10210v1http://arxiv.org/abs/2605.10210v1Federico Pizzolato et al. — arxiv:2605.10210 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyMeasurement-Adapted Eigentask Representations for Photon-Limited Optical Readouthttp://arxiv.org/abs/2605.10008v1http://arxiv.org/abs/2605.10008v1Tianyang Chen et al. — arxiv:2605.10008 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyFederated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coveragehttp://arxiv.org/abs/2605.09986v1http://arxiv.org/abs/2605.09986v1Prasanjit Dubey et al. — arxiv:2605.09986 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyYeti: A compact protein structure tokenizer for reconstruction and multi-modal generationhttp://arxiv.org/abs/2605.09981v1http://arxiv.org/abs/2605.09981v1Nabin Giri et al. — arxiv:2605.09981 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyFrequency Adapter with SAM for Generalized Medical Image Segmentationhttp://arxiv.org/abs/2605.09925v1http://arxiv.org/abs/2605.09925v1Phuoc-Nguyen Bui et al. — arxiv:2605.09925 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyConcordia: Self-Improving Synthetic Tables for Federated LLMshttp://arxiv.org/abs/2605.09855v1http://arxiv.org/abs/2605.09855v1Jimin Huang et al. — arxiv:2605.09855 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyCross-Domain Lossy Compression via Constrained Minimum Entropy Couplinghttp://arxiv.org/abs/2605.09833v1http://arxiv.org/abs/2605.09833v1Nam Nguyen et al. — arxiv:2605.09833 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyFashion Florence: Fine-Tuning Florence-2 for Structured Fashion Attribute Extractionhttp://arxiv.org/abs/2605.09827v1http://arxiv.org/abs/2605.09827v1Anushree Berlia et al. — arxiv:2605.09827 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyCompute Where it Counts: Self Optimizing Language Modelshttp://arxiv.org/abs/2605.10875v1http://arxiv.org/abs/2605.10875v1Yash Akhauri et al. — arxiv:2605.10875 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyConQuR: Corner Aligned Activation Quantization via Optimized Rotations for LLMshttp://arxiv.org/abs/2605.10793v1http://arxiv.org/abs/2605.10793v1Chayne Thrash et al. — arxiv:2605.10793 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyTowards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognitionhttp://arxiv.org/abs/2605.10772v1http://arxiv.org/abs/2605.10772v1David F. Ramirez et al. — arxiv:2605.10772 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyDynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuninghttp://arxiv.org/abs/2605.10765v1http://arxiv.org/abs/2605.10765v1Tao Hu et al. — arxiv:2605.10765 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyAdaPaD: Adaptive Parallel Deflation for PEFT with Self-Correcting Rank Discoveryhttp://arxiv.org/abs/2605.10741v1http://arxiv.org/abs/2605.10741v1Barbara Su et al. — arxiv:2605.10741 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyA Simplicial Approach to Higher Geometric Quantizationhttp://arxiv.org/abs/2605.10695v1http://arxiv.org/abs/2605.10695v1Qian Zhang et al. — arxiv:2605.10695 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyEnergy-Efficient Implementation of Spiking Recurrent Cells on FPGAhttp://arxiv.org/abs/2605.10679v1http://arxiv.org/abs/2605.10679v1Pascal Harmeling et al. — arxiv:2605.10679 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyCompander-Aligned Query Geometry for Quantized Zeroth-Order Optimizationhttp://arxiv.org/abs/2605.10673v1http://arxiv.org/abs/2605.10673v1Yao Shu et al. — arxiv:2605.10673 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencybViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognitionhttp://arxiv.org/abs/2605.10661v1http://arxiv.org/abs/2605.10661v1Michal Byra et al. — arxiv:2605.10661 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyBCJR-QAT: A Differentiable Relaxation of Trellis-Coded Weight Quantizationhttp://arxiv.org/abs/2605.10655v1http://arxiv.org/abs/2605.10655v1Venugopalan Iyengar et al. — arxiv:2605.10655 — LLM EfficiencyMon, 11 May 2026 00:00:00 GMTLLM EfficiencyG-Zero: Self-Play for Open-Ended Generation from Zero Datahttp://arxiv.org/abs/2605.09959v1http://arxiv.org/abs/2605.09959v1Chengsong Huang et al. — arxiv:2605.09959 — AlignmentMon, 11 May 2026 00:00:00 GMTAlignmentStructure from Strategic Interaction & Uncertainty Risk Sensitive Games for Robust Preference Learninghttp://arxiv.org/abs/2605.09946v1http://arxiv.org/abs/2605.09946v1Max Horwitz et al. — arxiv:2605.09946 — AlignmentMon, 11 May 2026 00:00:00 GMTAlignmentMASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimizationhttp://arxiv.org/abs/2605.10784v1http://arxiv.org/abs/2605.10784v1Rohan Surana et al. — arxiv:2605.10784 — AlignmentMon, 11 May 2026 00:00:00 GMTAlignmentAgentGR: Semantic-aware Agentic Group Decision-Making Simulator for Group Recommendationhttp://arxiv.org/abs/2605.10367v1http://arxiv.org/abs/2605.10367v1Yangtao Zhou et al. — arxiv:2605.10367 — AlignmentMon, 11 May 2026 00:00:00 GMTAlignmentLeveraging RAG for Training-Free Alignment of LLMshttp://arxiv.org/abs/2605.11217v1http://arxiv.org/abs/2605.11217v1John T. Halloran et al. — arxiv:2605.11217 — AlignmentMon, 11 May 2026 00:00:00 GMTAlignmentSpurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Traininghttp://arxiv.org/abs/2605.11134v1http://arxiv.org/abs/2605.11134v1Christian Moya et al. — arxiv:2605.11134 — AlignmentMon, 11 May 2026 00:00:00 GMTAlignmentKnowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.10253v1http://arxiv.org/abs/2605.10253v1Peiru Yang et al. — arxiv:2605.10253 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationThe Vote-Left Equilibrium: A Deterministic Coordination Strategy for the Faithful in The Traitorshttp://arxiv.org/abs/2605.10233v1http://arxiv.org/abs/2605.10233v1Vince Knight et al. — arxiv:2605.10233 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationFORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimizationhttp://arxiv.org/abs/2605.10230v1http://arxiv.org/abs/2605.10230v1Qingchuan Zhang et al. — arxiv:2605.10230 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationSciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluationhttp://arxiv.org/abs/2605.10187v1http://arxiv.org/abs/2605.10187v1Longteng Guo et al. — arxiv:2605.10187 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationASTRA-QA: A Benchmark for Abstract Question Answering over Documentshttp://arxiv.org/abs/2605.10168v1http://arxiv.org/abs/2605.10168v1Shu Wang et al. — arxiv:2605.10168 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationExplanation-Aware Learning for Enhanced Interpretability in Biomedical Imaginghttp://arxiv.org/abs/2605.10054v1http://arxiv.org/abs/2605.10054v1Zubair Faruqui et al. — arxiv:2605.10054 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationEchoPrune: Interpreting Redundancy as Temporal Echoes for Efficient VideoLLMshttp://arxiv.org/abs/2605.10050v1http://arxiv.org/abs/2605.10050v1Jiameng Li et al. — arxiv:2605.10050 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationTrajDLM: Topology-Aware Block Diffusion Language Model for Trajectory Generationhttp://arxiv.org/abs/2605.10020v1http://arxiv.org/abs/2605.10020v1Wilson Wongso et al. — arxiv:2605.10020 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationMed-StepBench: A Hierarchical Reasoning Framework for Evaluating Hallucinations in Medical Vision-Language Modelshttp://arxiv.org/abs/2605.10002v1http://arxiv.org/abs/2605.10002v1Minh Khoi Nguyen et al. — arxiv:2605.10002 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationOmni-Persona: Systematic Benchmarking and Improving Omnimodal Personalizationhttp://arxiv.org/abs/2605.09996v1http://arxiv.org/abs/2605.09996v1Yeongtak Oh et al. — arxiv:2605.09996 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationEvaluating the False Trust engendered by LLM Explanationshttp://arxiv.org/abs/2605.10930v1http://arxiv.org/abs/2605.10930v1Vardhan Palod et al. — arxiv:2605.10930 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationPixal3D: Pixel-Aligned 3D Generation from Imageshttp://arxiv.org/abs/2605.10922v1http://arxiv.org/abs/2605.10922v1Dong-Yang Li et al. — arxiv:2605.10922 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationGrounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Rankinghttp://arxiv.org/abs/2605.10893v1http://arxiv.org/abs/2605.10893v1Reza Khanmohammadi et al. — arxiv:2605.10893 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationNeural at ArchEHR-QA 2026: One Method Fits All: Unified Prompt Optimization for Clinical QA over EHRshttp://arxiv.org/abs/2605.10877v1http://arxiv.org/abs/2605.10877v1Abrar Majeedi et al. — arxiv:2605.10877 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationAttractor-Vascular Coupling Theory: Formal Grounding and Empirical Validation for AAMI-Standard Cuffless Blood Pressure Estimation from Smartphone Photoplethysmographyhttp://arxiv.org/abs/2605.10871v1http://arxiv.org/abs/2605.10871v1Timothy Oladunni et al. — arxiv:2605.10871 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationBenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CADhttp://arxiv.org/abs/2605.10865v1http://arxiv.org/abs/2605.10865v1Haozhe Zhang et al. — arxiv:2605.10865 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationBabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representationhttp://arxiv.org/abs/2605.10845v1http://arxiv.org/abs/2605.10845v1Qi Yang et al. — arxiv:2605.10845 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationProbing Cross-modal Information Hubs in Audio-Visual LLMshttp://arxiv.org/abs/2605.10815v1http://arxiv.org/abs/2605.10815v1Jihoo Jung et al. — arxiv:2605.10815 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationNew AI-Driven Tools for Enhancing Campus Well-being: A Prevention and Intervention Approachhttp://arxiv.org/abs/2605.10804v1http://arxiv.org/abs/2605.10804v1Jinwen Tang et al. — arxiv:2605.10804 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationThe Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studieshttp://arxiv.org/abs/2605.10799v1http://arxiv.org/abs/2605.10799v1Gabriel Garcia et al. — arxiv:2605.10799 — HallucinationMon, 11 May 2026 00:00:00 GMTHallucinationMetis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimizationhttp://arxiv.org/abs/2605.10067v1http://arxiv.org/abs/2605.10067v1Huilin Zhou et al. — arxiv:2605.10067 — LLM SafetyMon, 11 May 2026 00:00:00 GMTLLM SafetyAdversarial Attacks Against MLLMs via Progressive Resolution Processing and Adaptive Feature Alignmenthttp://arxiv.org/abs/2605.09902v1http://arxiv.org/abs/2605.09902v1Haobo Wang et al. — arxiv:2605.09902 — LLM SafetyMon, 11 May 2026 00:00:00 GMTLLM SafetyBeyond Red-Teaming: Formal Guarantees of LLM Guardrail Classifiershttp://arxiv.org/abs/2605.10901v1http://arxiv.org/abs/2605.10901v1Nikita Kezins et al. — arxiv:2605.10901 — LLM SafetyMon, 11 May 2026 00:00:00 GMTLLM SafetyRUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systemshttp://arxiv.org/abs/2605.10862v1http://arxiv.org/abs/2605.10862v1Joel Rorseth et al. — arxiv:2605.10862 — LLM SafetyMon, 11 May 2026 00:00:00 GMTLLM SafetyLLMs for Secure Hardware Design and Related Problems: Opportunities and Challengeshttp://arxiv.org/abs/2605.10807v1http://arxiv.org/abs/2605.10807v1Johann Knechtel et al. — arxiv:2605.10807 — LLM SafetyMon, 11 May 2026 00:00:00 GMTLLM SafetyLITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environmentshttp://arxiv.org/abs/2605.10779v1http://arxiv.org/abs/2605.10779v1Chiyu Zhang et al. — arxiv:2605.10779 — LLM SafetyMon, 11 May 2026 00:00:00 GMTLLM SafetyBreak the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximizationhttp://arxiv.org/abs/2605.10764v1http://arxiv.org/abs/2605.10764v1Mengqi He et al. — arxiv:2605.10764 — LLM SafetyMon, 11 May 2026 00:00:00 GMTLLM SafetyRe-Triggering Safeguards within LLMs for Jailbreak Detectionhttp://arxiv.org/abs/2605.10611v1http://arxiv.org/abs/2605.10611v1Zheng Lin et al. — arxiv:2605.10611 — LLM SafetyMon, 11 May 2026 00:00:00 GMTLLM SafetyGuaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothinghttp://arxiv.org/abs/2605.10582v1http://arxiv.org/abs/2605.10582v1Zheng Lin et al. — arxiv:2605.10582 — LLM SafetyMon, 11 May 2026 00:00:00 GMTLLM SafetyTourMart: A Parametric Audit Instrument for Commission Steering in LLM Travel Agentshttp://arxiv.org/abs/2605.10440v1http://arxiv.org/abs/2605.10440v1Yao Liu et al. — arxiv:2605.10440 — LLM SafetyMon, 11 May 2026 00:00:00 GMTLLM SafetyAdversarial SQL Injection Generation with LLM-Based Architectureshttp://arxiv.org/abs/2605.11188v1http://arxiv.org/abs/2605.11188v1Ali Karakoc et al. — arxiv:2605.11188 — LLM SafetyMon, 11 May 2026 00:00:00 GMTLLM SafetyIndustryBench: Probing the Industrial Knowledge Boundaries of LLMshttp://arxiv.org/abs/2605.10267v1http://arxiv.org/abs/2605.10267v1Songlin Bai et al. — arxiv:2605.10267 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationUnsupervised Process Reward Modelshttp://arxiv.org/abs/2605.10158v1http://arxiv.org/abs/2605.10158v1Artyom Gadetsky et al. — arxiv:2605.10158 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationFormalRewardBench: A Benchmark for Formal Theorem Proving Reward Modelshttp://arxiv.org/abs/2605.10141v1http://arxiv.org/abs/2605.10141v1Zeynel A. Uluşan et al. — arxiv:2605.10141 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationG-Zero: Self-Play for Open-Ended Generation from Zero Datahttp://arxiv.org/abs/2605.09959v1http://arxiv.org/abs/2605.09959v1Chengsong Huang et al. — arxiv:2605.09959 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationTeam-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMshttp://arxiv.org/abs/2605.09922v1http://arxiv.org/abs/2605.09922v1Wu Li et al. — arxiv:2605.09922 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationNautilus Compass: Black-box Persona Drift Detection for Production LLM Agentshttp://arxiv.org/abs/2605.09863v1http://arxiv.org/abs/2605.09863v1Chunxiao Wang et al. — arxiv:2605.09863 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationEvaluating the False Trust engendered by LLM Explanationshttp://arxiv.org/abs/2605.10930v1http://arxiv.org/abs/2605.10930v1Vardhan Palod et al. — arxiv:2605.10930 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationWildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluationhttp://arxiv.org/abs/2605.10912v1http://arxiv.org/abs/2605.10912v1Shuangrui Ding et al. — arxiv:2605.10912 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationGrounded Satirical Generation with RAGhttp://arxiv.org/abs/2605.10853v1http://arxiv.org/abs/2605.10853v1Oona Itkonen et al. — arxiv:2605.10853 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationBabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representationhttp://arxiv.org/abs/2605.10845v1http://arxiv.org/abs/2605.10845v1Qi Yang et al. — arxiv:2605.10845 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationReasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judgehttp://arxiv.org/abs/2605.10805v1http://arxiv.org/abs/2605.10805v1Wenbo Zhang et al. — arxiv:2605.10805 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationLITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environmentshttp://arxiv.org/abs/2605.10779v1http://arxiv.org/abs/2605.10779v1Chiyu Zhang et al. — arxiv:2605.10779 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationNavigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarkshttp://arxiv.org/abs/2605.10639v1http://arxiv.org/abs/2605.10639v1Regina Gugg et al. — arxiv:2605.10639 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationPRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelineshttp://arxiv.org/abs/2605.10614v1http://arxiv.org/abs/2605.10614v1Riya Tapwal et al. — arxiv:2605.10614 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationLLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluationhttp://arxiv.org/abs/2605.10593v1http://arxiv.org/abs/2605.10593v1Philipp Steigerwald et al. — arxiv:2605.10593 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationValid Best-Model Identification for LLM Evaluation via Low-Rank Factorizationhttp://arxiv.org/abs/2605.10405v1http://arxiv.org/abs/2605.10405v1Elad Tolochinsky et al. — arxiv:2605.10405 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationVERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inferencehttp://arxiv.org/abs/2605.11334v1http://arxiv.org/abs/2605.11334v1Jasmine Qi et al. — arxiv:2605.11334 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationRethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insightshttp://arxiv.org/abs/2605.11330v1http://arxiv.org/abs/2605.11330v1Wenbo Chen et al. — arxiv:2605.11330 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationRethinking LLMOps for Fraud and AML: Building a Compliance-Grade LLM Serving Stackhttp://arxiv.org/abs/2605.11232v1http://arxiv.org/abs/2605.11232v1Prathamesh Vasudeo Naik et al. — arxiv:2605.11232 — LLM EvaluationMon, 11 May 2026 00:00:00 GMTLLM EvaluationUsability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirementshttp://arxiv.org/abs/2605.10133v1http://arxiv.org/abs/2605.10133v1Yue Li et al. — arxiv:2605.10133 — Code LLMMon, 11 May 2026 00:00:00 GMTCode LLMProspective Compression in Human Abstraction Learninghttp://arxiv.org/abs/2605.09985v1http://arxiv.org/abs/2605.09985v1Leonardo Hernandez Cano et al. — arxiv:2605.09985 — Code LLMMon, 11 May 2026 00:00:00 GMTCode LLMRADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generationhttp://arxiv.org/abs/2605.09907v1http://arxiv.org/abs/2605.09907v1Zhen Zhang et al. — arxiv:2605.09907 — Code LLMMon, 11 May 2026 00:00:00 GMTCode LLMCADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generationhttp://arxiv.org/abs/2605.10873v1http://arxiv.org/abs/2605.10873v1Anna C. Doris et al. — arxiv:2605.10873 — Code LLMMon, 11 May 2026 00:00:00 GMTCode LLMBenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CADhttp://arxiv.org/abs/2605.10865v2http://arxiv.org/abs/2605.10865v2Haozhe Zhang et al. — arxiv:2605.10865 — Code LLMMon, 11 May 2026 00:00:00 GMTCode LLMThe Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agentshttp://arxiv.org/abs/2605.10754v1http://arxiv.org/abs/2605.10754v1Xinrun Wang et al. — arxiv:2605.10754 — Code LLMMon, 11 May 2026 00:00:00 GMTCode LLMAutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verificationhttp://arxiv.org/abs/2605.10712v1http://arxiv.org/abs/2605.10712v1Paschal C. Amusuo et al. — arxiv:2605.10712 — Code LLMMon, 11 May 2026 00:00:00 GMTCode LLMCorrect-by-Construction G-Code Generation: A Neuro-Symbolic Approach via Separation Logichttp://arxiv.org/abs/2605.10568v1http://arxiv.org/abs/2605.10568v1Yeonseok Lee et al. — arxiv:2605.10568 — Code LLMMon, 11 May 2026 00:00:00 GMTCode LLMVision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generationhttp://arxiv.org/abs/2605.11307v1http://arxiv.org/abs/2605.11307v1Ajay Vikram Periasami et al. — arxiv:2605.11307 — Code LLMMon, 11 May 2026 00:00:00 GMTCode LLMPrimal Generation, Dual Judgment: Self-Training from Test-Time Scalinghttp://arxiv.org/abs/2605.11299v1http://arxiv.org/abs/2605.11299v1Yizhu Jiao et al. — arxiv:2605.11299 — Code LLMMon, 11 May 2026 00:00:00 GMTCode LLMInternalizing Curriculum Judgment for LLM Reinforcement Fine-Tuninghttp://arxiv.org/abs/2605.11235v1http://arxiv.org/abs/2605.11235v1Han Zheng et al. — arxiv:2605.11235 — Code LLMMon, 11 May 2026 00:00:00 GMTCode LLMAffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modelinghttp://arxiv.org/abs/2605.11098v1http://arxiv.org/abs/2605.11098v1Jiacheng Shi et al. — arxiv:2605.11098 — Speech LLMMon, 11 May 2026 00:00:00 GMTSpeech LLMLegalCiteBench: Evaluating Citation Reliability in Legal Language Modelshttp://arxiv.org/abs/2605.10186v1http://arxiv.org/abs/2605.10186v1Sijia Chen et al. — arxiv:2605.10186 — Legal NLPMon, 11 May 2026 00:00:00 GMTLegal NLPNyayaAI: An AI-Powered Legal Assistant Using Multi-Agent Architecture and Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.10155v1http://arxiv.org/abs/2605.10155v1Deepanshu et al. — arxiv:2605.10155 — Legal NLPMon, 11 May 2026 00:00:00 GMTLegal NLPBabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representationhttp://arxiv.org/abs/2605.10845v1http://arxiv.org/abs/2605.10845v1Qi Yang et al. — arxiv:2605.10845 — Multilingual NLPMon, 11 May 2026 00:00:00 GMTMultilingual NLPWhy Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgishhttp://arxiv.org/abs/2605.10714v1http://arxiv.org/abs/2605.10714v1Fred Philippy et al. — arxiv:2605.10714 — Multilingual NLPMon, 11 May 2026 00:00:00 GMTMultilingual NLPICT-NLP at SemEval-2026 Task 3: Less Is More -- Multilingual Encoder with Joint Training and Adaptive Ensemble for Dimensional Aspect Sentiment Regressionhttp://arxiv.org/abs/2605.10560v1http://arxiv.org/abs/2605.10560v1Liyuan Huang et al. — arxiv:2605.10560 — Multilingual NLPMon, 11 May 2026 00:00:00 GMTMultilingual NLPGLiNER-Relex: A Unified Framework for Joint Named Entity Recognition and Relation Extractionhttp://arxiv.org/abs/2605.10108v1http://arxiv.org/abs/2605.10108v1Ihor Stepanov et al. — arxiv:2605.10108 — Named Entity RecognitionMon, 11 May 2026 00:00:00 GMTNamed Entity RecognitionInterpretable Coreference Resolution Evaluation Using Explicit Semanticshttp://arxiv.org/abs/2605.10627v1http://arxiv.org/abs/2605.10627v1Bruno Gatti et al. — arxiv:2605.10627 — Named Entity RecognitionMon, 11 May 2026 00:00:00 GMTNamed Entity RecognitionReconstructing rare particle source by femtoscopic correlationshttp://arxiv.org/abs/2605.10167v1http://arxiv.org/abs/2605.10167v1Liang Zhang et al. — arxiv:2605.10167 — Information ExtractionMon, 11 May 2026 00:00:00 GMTInformation ExtractionOUIDecay: Adaptive Layer-wise Weight Decay for CNNs Using Online Activation Patternshttp://arxiv.org/abs/2605.10161v1http://arxiv.org/abs/2605.10161v1Alberto Fernández-Hernández et al. — arxiv:2605.10161 — Information ExtractionMon, 11 May 2026 00:00:00 GMTInformation ExtractionUseful for Exploration, Risky for Precision: Evaluating AI Tools in Academic Researchhttp://arxiv.org/abs/2605.10125v2http://arxiv.org/abs/2605.10125v2Anthea Dathe et al. — arxiv:2605.10125 — Information ExtractionMon, 11 May 2026 00:00:00 GMTInformation ExtractionPlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoninghttp://arxiv.org/abs/2605.10032v2http://arxiv.org/abs/2605.10032v2Sajib Acharjee Dip et al. — arxiv:2605.10032 — Information ExtractionMon, 11 May 2026 00:00:00 GMTInformation ExtractionGLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extractionhttp://arxiv.org/abs/2605.09973v1http://arxiv.org/abs/2605.09973v1Urchade Zaratiana et al. — arxiv:2605.09973 — Information ExtractionMon, 11 May 2026 00:00:00 GMTInformation ExtractionInformation Extraction of Nested Complex Structure of Quantum Cascade Lasers via Large Language Modelshttp://arxiv.org/abs/2605.09927v1http://arxiv.org/abs/2605.09927v1Xiao Fang et al. — arxiv:2605.09927 — Information ExtractionMon, 11 May 2026 00:00:00 GMTInformation ExtractionVERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inferencehttp://arxiv.org/abs/2605.11334v1http://arxiv.org/abs/2605.11334v1Jasmine Qi et al. — arxiv:2605.11334 — Text ClassificationMon, 11 May 2026 00:00:00 GMTText ClassificationHEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Serieshttp://arxiv.org/abs/2605.11130v2http://arxiv.org/abs/2605.11130v2Jonas Petersen et al. — arxiv:2605.11130 — Text ClassificationMon, 11 May 2026 00:00:00 GMTText ClassificationIndustryBench: Probing the Industrial Knowledge Boundaries of LLMshttp://arxiv.org/abs/2605.10267v1http://arxiv.org/abs/2605.10267v1Songlin Bai et al. — arxiv:2605.10267 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringHow Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialoguehttp://arxiv.org/abs/2605.10199v1http://arxiv.org/abs/2605.10199v1Hui Lu et al. — arxiv:2605.10199 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringLegalCiteBench: Evaluating Citation Reliability in Legal Language Modelshttp://arxiv.org/abs/2605.10186v1http://arxiv.org/abs/2605.10186v1Sijia Chen et al. — arxiv:2605.10186 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringASTRA-QA: A Benchmark for Abstract Question Answering over Documentshttp://arxiv.org/abs/2605.10168v1http://arxiv.org/abs/2605.10168v1Shu Wang et al. — arxiv:2605.10168 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringUseful for Exploration, Risky for Precision: Evaluating AI Tools in Academic Researchhttp://arxiv.org/abs/2605.10125v1http://arxiv.org/abs/2605.10125v1Anthea Dathe et al. — arxiv:2605.10125 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringMAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphshttp://arxiv.org/abs/2605.10064v1http://arxiv.org/abs/2605.10064v1Ruiyi Yang et al. — arxiv:2605.10064 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringSeparate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thoughthttp://arxiv.org/abs/2605.09906v1http://arxiv.org/abs/2605.09906v1Xuanchen Li et al. — arxiv:2605.09906 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringTOC-Bench: A Temporal Object Consistency Benchmark for Video Large Language Modelshttp://arxiv.org/abs/2605.09904v1http://arxiv.org/abs/2605.09904v1Junzhe Chen et al. — arxiv:2605.09904 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringGrounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Rankinghttp://arxiv.org/abs/2605.10893v1http://arxiv.org/abs/2605.10893v1Reza Khanmohammadi et al. — arxiv:2605.10893 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringNeural at ArchEHR-QA 2026: One Method Fits All: Unified Prompt Optimization for Clinical QA over EHRshttp://arxiv.org/abs/2605.10877v1http://arxiv.org/abs/2605.10877v1Abrar Majeedi et al. — arxiv:2605.10877 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringBenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CADhttp://arxiv.org/abs/2605.10865v1http://arxiv.org/abs/2605.10865v1Haozhe Zhang et al. — arxiv:2605.10865 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringDGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimizationhttp://arxiv.org/abs/2605.10863v1http://arxiv.org/abs/2605.10863v1Mengyi Deng et al. — arxiv:2605.10863 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringVerification Mirage: Mapping the Reliability Boundary of Self-Verification in Medical VQAhttp://arxiv.org/abs/2605.10850v1http://arxiv.org/abs/2605.10850v1Ruinan Jin et al. — arxiv:2605.10850 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringPathISE: Learning Informative Path Supervision for Knowledge Graph Question Answeringhttp://arxiv.org/abs/2605.10791v1http://arxiv.org/abs/2605.10791v1Shengxiang Gao et al. — arxiv:2605.10791 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringMASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimizationhttp://arxiv.org/abs/2605.10784v1http://arxiv.org/abs/2605.10784v1Rohan Surana et al. — arxiv:2605.10784 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringTowards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognitionhttp://arxiv.org/abs/2605.10772v1http://arxiv.org/abs/2605.10772v1David F. Ramirez et al. — arxiv:2605.10772 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringGridProbe: Posterior-Probing for Adaptive Test-Time Compute in Long-Video VLMshttp://arxiv.org/abs/2605.10762v1http://arxiv.org/abs/2605.10762v1Mohamed Eltahir et al. — arxiv:2605.10762 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringRadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiologyhttp://arxiv.org/abs/2605.10761v1http://arxiv.org/abs/2605.10761v1Wenxuan Li et al. — arxiv:2605.10761 — Question AnsweringMon, 11 May 2026 00:00:00 GMTQuestion AnsweringDECO-MWE: building a linguistic resource of Korean multiword expressions for feature-based sentiment analysishttp://arxiv.org/abs/2605.10295v1http://arxiv.org/abs/2605.10295v1Jaeho Han et al. — arxiv:2605.10295 — Sentiment AnalysisMon, 11 May 2026 00:00:00 GMTSentiment AnalysisBeyond Majority Voting: Agreement-Based Clustering to Model Annotator Perspectives in Subjective NLP Taskshttp://arxiv.org/abs/2605.09955v1http://arxiv.org/abs/2605.09955v1Tadesse Destaw Belay et al. — arxiv:2605.09955 — Sentiment AnalysisMon, 11 May 2026 00:00:00 GMTSentiment AnalysisThe Association of Transformer-based Sentiment Analysis with Symptom Distress and Deterioration in Routine Psychotherapy Carehttp://arxiv.org/abs/2605.09838v1http://arxiv.org/abs/2605.09838v1Douglas K. Faust et al. — arxiv:2605.09838 — Sentiment AnalysisMon, 11 May 2026 00:00:00 GMTSentiment AnalysisRelations Are Channels: Knowledge Graph Embedding via Kraus Decompositionshttp://arxiv.org/abs/2605.10317v1http://arxiv.org/abs/2605.10317v1Sayan Kumar Chaki et al. — arxiv:2605.10317 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphMicroWorld: Empowering Multimodal Large Language Models to Bridge the Microscopic Domain Gap with Multimodal Attribute Graphhttp://arxiv.org/abs/2605.10120v1http://arxiv.org/abs/2605.10120v1Manyu Li et al. — arxiv:2605.10120 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphGLiNER-Relex: A Unified Framework for Joint Named Entity Recognition and Relation Extractionhttp://arxiv.org/abs/2605.10108v1http://arxiv.org/abs/2605.10108v1Ihor Stepanov et al. — arxiv:2605.10108 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphMAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphshttp://arxiv.org/abs/2605.10064v1http://arxiv.org/abs/2605.10064v1Ruiyi Yang et al. — arxiv:2605.10064 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphGraphInstruct: A Progressive Benchmark for Diagnosing Capability Gaps in LLM Graph Generationhttp://arxiv.org/abs/2605.09997v1http://arxiv.org/abs/2605.09997v1Zihe Wei et al. — arxiv:2605.09997 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphPathISE: Learning Informative Path Supervision for Knowledge Graph Question Answeringhttp://arxiv.org/abs/2605.10791v1http://arxiv.org/abs/2605.10791v1Shengxiang Gao et al. — arxiv:2605.10791 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphHierarchical Causal Abduction: A Foundation Framework for Explainable Model Predictive Controlhttp://arxiv.org/abs/2605.10624v1http://arxiv.org/abs/2605.10624v1Ramesh Arvind Naagarajan et al. — arxiv:2605.10624 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphKeeping track of errors: A study of SHACL-DS for RDF dataset validation on the ERA RINF Knowledge Graphhttp://arxiv.org/abs/2605.10540v1http://arxiv.org/abs/2605.10540v1Davan Chiem Dao et al. — arxiv:2605.10540 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphA Reflective Storytelling Agent for Older Adults: Integrating Argumentation Schemes and Argument Mining in LLM-Based Personalised Narrativeshttp://arxiv.org/abs/2605.10531v1http://arxiv.org/abs/2605.10531v1Jayalakshmi Baskar et al. — arxiv:2605.10531 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphPrimeKG-CL: A Continual Graph Learning Benchmark on Evolving Biomedical Knowledge Graphshttp://arxiv.org/abs/2605.10529v1http://arxiv.org/abs/2605.10529v1Yousef A. Radwan et al. — arxiv:2605.10529 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphCMKL: Modality-Aware Continual Learning for Evolving Biomedical Knowledge Graphshttp://arxiv.org/abs/2605.10510v1http://arxiv.org/abs/2605.10510v1Yousef A. Radwan et al. — arxiv:2605.10510 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphMuch of Geospatial Web Search Is Beyond Traditional GIShttp://arxiv.org/abs/2605.11336v1http://arxiv.org/abs/2605.11336v1Ilya Ilyankou et al. — arxiv:2605.11336 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphCORE: Cyclic Orthotope Relation Embedding for Knowledge Graph Completionhttp://arxiv.org/abs/2605.11159v1http://arxiv.org/abs/2605.11159v1Yingqi Zeng et al. — arxiv:2605.11159 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IVhttp://arxiv.org/abs/2605.11143v1http://arxiv.org/abs/2605.11143v1Alex Stinard et al. — arxiv:2605.11143 — Knowledge GraphMon, 11 May 2026 00:00:00 GMTKnowledge GraphWISTERIA: Learning Clinical Representations from Noisy Supervision via Multi-View Consistency in Electronic Health Recordshttp://arxiv.org/abs/2605.09765v1http://arxiv.org/abs/2605.09765v1Ruan Dong et al. — arxiv:2605.09765 — NLPSun, 10 May 2026 00:00:00 GMTNLPMitigating Multimodal Inconsistency via Cognitive Dual-Pathway Reasoning for Intent Recognitionhttp://arxiv.org/abs/2605.09468v1http://arxiv.org/abs/2605.09468v1Yifan Wang et al. — arxiv:2605.09468 — NLPSun, 10 May 2026 00:00:00 GMTNLPHOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activitieshttp://arxiv.org/abs/2605.09348v1http://arxiv.org/abs/2605.09348v1Shusaku Egami et al. — arxiv:2605.09348 — NLPSun, 10 May 2026 00:00:00 GMTNLPMedMeta: A Benchmark for LLMs in Synthesizing Meta-Analysis Conclusion from Medical Studieshttp://arxiv.org/abs/2605.09661v1http://arxiv.org/abs/2605.09661v1Huy Hoang Ha et al. — arxiv:2605.09661 — RAGSun, 10 May 2026 00:00:00 GMTRAGByte-Exact Deduplication in Retrieval-Augmented Generation: A Three-Regime Empirical Analysis Across Public Benchmarkshttp://arxiv.org/abs/2605.09611v1http://arxiv.org/abs/2605.09611v1Sietse Schelpe et al. — arxiv:2605.09611 — RAGSun, 10 May 2026 00:00:00 GMTRAGLEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Modelshttp://arxiv.org/abs/2605.09806v1http://arxiv.org/abs/2605.09806v1Songtao Wei et al. — arxiv:2605.09806 — ReasoningSun, 10 May 2026 00:00:00 GMTReasoningDistilling 3D Spatial Reasoning into a Lightweight Vision-Language Model with CoThttp://arxiv.org/abs/2605.09719v1http://arxiv.org/abs/2605.09719v1Alaa Asfour et al. — arxiv:2605.09719 — ReasoningSun, 10 May 2026 00:00:00 GMTReasoningDo multimodal models imagine electric sheep?http://arxiv.org/abs/2605.09693v1http://arxiv.org/abs/2605.09693v1Santhosh Kumar Ramakrishnan et al. — arxiv:2605.09693 — ReasoningSun, 10 May 2026 00:00:00 GMTReasoningOracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoninghttp://arxiv.org/abs/2605.09822v1http://arxiv.org/abs/2605.09822v1Ben Kereopa-Yorke et al. — arxiv:2605.09822 — Tool UseSun, 10 May 2026 00:00:00 GMTTool UseEvaluating Tool Cloning in Agentic-AI Ecosystemshttp://arxiv.org/abs/2605.09817v1http://arxiv.org/abs/2605.09817v1Taein Kim et al. — arxiv:2605.09817 — Tool UseSun, 10 May 2026 00:00:00 GMTTool UseTrajectory Supervision for Continual Tool-Use Learning in LLMshttp://arxiv.org/abs/2605.09734v1http://arxiv.org/abs/2605.09734v1Vishnu Vardhan Reddy et al. — arxiv:2605.09734 — Tool UseSun, 10 May 2026 00:00:00 GMTTool UseRubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinementhttp://arxiv.org/abs/2605.09730v1http://arxiv.org/abs/2605.09730v1Will LeVine et al. — arxiv:2605.09730 — Tool UseSun, 10 May 2026 00:00:00 GMTTool UseMonitoringBench: Semi-Automated Red-Teaming for Agent Monitoringhttp://arxiv.org/abs/2605.09684v1http://arxiv.org/abs/2605.09684v1Monika Jotautaitė et al. — arxiv:2605.09684 — Tool UseSun, 10 May 2026 00:00:00 GMTTool UseLearning to Compress Time-to-Control: A Reinforcement Learning Framework for Chronic Disease Managementhttp://arxiv.org/abs/2605.09818v1http://arxiv.org/abs/2605.09818v1Prabhjot Singh et al. — arxiv:2605.09818 — AlignmentSun, 10 May 2026 00:00:00 GMTAlignmentEvoPref: Multi-Objective Evolutionary Optimization Discovers Diverse LLM Alignments Beyond Gradient Descenthttp://arxiv.org/abs/2605.09777v1http://arxiv.org/abs/2605.09777v1Dongxin Guo et al. — arxiv:2605.09777 — AlignmentSun, 10 May 2026 00:00:00 GMTAlignmentOffline Preference Optimization for Rectified Flow with Noise-Tracked Pairshttp://arxiv.org/abs/2605.09433v1http://arxiv.org/abs/2605.09433v1Yunhong Lu et al. — arxiv:2605.09433 — AlignmentSun, 10 May 2026 00:00:00 GMTAlignmentNear-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actionshttp://arxiv.org/abs/2605.09363v1http://arxiv.org/abs/2605.09363v1Soumita Hait et al. — arxiv:2605.09363 — AlignmentSun, 10 May 2026 00:00:00 GMTAlignmentCALYREX: Cross-Attention LaYeR EXtended Transformers for System Prompt Anchoringhttp://arxiv.org/abs/2605.09737v1http://arxiv.org/abs/2605.09737v1Li Lixing et al. — arxiv:2605.09737 — LLM SafetySun, 10 May 2026 00:00:00 GMTLLM SafetyMonitoringBench: Semi-Automated Red-Teaming for Agent Monitoringhttp://arxiv.org/abs/2605.09684v1http://arxiv.org/abs/2605.09684v1Monika Jotautaitė et al. — arxiv:2605.09684 — LLM SafetySun, 10 May 2026 00:00:00 GMTLLM SafetyModeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMshttp://arxiv.org/abs/2605.09647v1http://arxiv.org/abs/2605.09647v1Jingshen Zhang et al. — arxiv:2605.09647 — LLM SafetySun, 10 May 2026 00:00:00 GMTLLM Safety"Training robust watermarking model may hurt authentication!'' Exploring and Mitigating the Identity Leakage in Robust Watermarkinghttp://arxiv.org/abs/2605.09646v1http://arxiv.org/abs/2605.09646v1Xinyu Zhang et al. — arxiv:2605.09646 — LLM SafetySun, 10 May 2026 00:00:00 GMTLLM SafetyPosition: AI Security Policy Should Target Systems, Not Modelshttp://arxiv.org/abs/2605.09504v1http://arxiv.org/abs/2605.09504v1Michael A. Riegler et al. — arxiv:2605.09504 — LLM SafetySun, 10 May 2026 00:00:00 GMTLLM SafetyNEXUS: Continual Learning of Symbolic Constraints for Safe and Robust Embodied Planninghttp://arxiv.org/abs/2605.09387v1http://arxiv.org/abs/2605.09387v1Tiehan Cui et al. — arxiv:2605.09387 — LLM SafetySun, 10 May 2026 00:00:00 GMTLLM SafetyCalibrate, Don't Curate: Label-Efficient Estimation from Noisy LLM Judgeshttp://arxiv.org/abs/2605.09702v1http://arxiv.org/abs/2605.09702v1Yanran Li et al. — arxiv:2605.09702 — LLM EvaluationSun, 10 May 2026 00:00:00 GMTLLM EvaluationMedMeta: A Benchmark for LLMs in Synthesizing Meta-Analysis Conclusion from Medical Studieshttp://arxiv.org/abs/2605.09661v1http://arxiv.org/abs/2605.09661v1Huy Hoang Ha et al. — arxiv:2605.09661 — LLM EvaluationSun, 10 May 2026 00:00:00 GMTLLM EvaluationCLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubricshttp://arxiv.org/abs/2605.09584v1http://arxiv.org/abs/2605.09584v1Aishik Nagar et al. — arxiv:2605.09584 — LLM EvaluationSun, 10 May 2026 00:00:00 GMTLLM Evaluationfmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discoveryhttp://arxiv.org/abs/2605.09438v1http://arxiv.org/abs/2605.09438v1Andreas D. Demou et al. — arxiv:2605.09438 — LLM EvaluationSun, 10 May 2026 00:00:00 GMTLLM EvaluationEntropy-informed Decoding: Adaptive Information-Driven Branchinghttp://arxiv.org/abs/2605.09745v1http://arxiv.org/abs/2605.09745v1Benjamin Patrick Evans et al. — arxiv:2605.09745 — Code LLMSun, 10 May 2026 00:00:00 GMTCode LLMCodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agentshttp://arxiv.org/abs/2605.09675v1http://arxiv.org/abs/2605.09675v1Timothy Ossowski et al. — arxiv:2605.09675 — Code LLMSun, 10 May 2026 00:00:00 GMTCode LLMPDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generationhttp://arxiv.org/abs/2605.09636v1http://arxiv.org/abs/2605.09636v1Zhen Hang et al. — arxiv:2605.09636 — Code LLMSun, 10 May 2026 00:00:00 GMTCode LLMCrosslingual On-Policy Self-Distillation for Multilingual Reasoninghttp://arxiv.org/abs/2605.09548v1http://arxiv.org/abs/2605.09548v1Yihong Liu et al. — arxiv:2605.09548 — Multilingual NLPSun, 10 May 2026 00:00:00 GMTMultilingual NLPAgentShield: Deception-based Compromise Detection for Tool-using LLM Agentshttp://arxiv.org/abs/2605.11026v1http://arxiv.org/abs/2605.11026v1Yassin H. Rassul et al. — arxiv:2605.11026 — Multilingual NLPSun, 10 May 2026 00:00:00 GMTMultilingual NLPMemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agentshttp://arxiv.org/abs/2605.09530v2http://arxiv.org/abs/2605.09530v2Yining Chen et al. — arxiv:2605.09530 — Information ExtractionSun, 10 May 2026 00:00:00 GMTInformation ExtractionHow to count clustered galaxieshttp://arxiv.org/abs/2605.09248v1http://arxiv.org/abs/2605.09248v1Yunting Wang et al. — arxiv:2605.09248 — Information ExtractionSun, 10 May 2026 00:00:00 GMTInformation Extractioncantnlp@DravidianLangTech 2026: organic domain adaptation improves multi-class hope speech detection in Tuluhttp://arxiv.org/abs/2605.09795v1http://arxiv.org/abs/2605.09795v1Andrew Li et al. — arxiv:2605.09795 — Text ClassificationSun, 10 May 2026 00:00:00 GMTText ClassificationDistilling 3D Spatial Reasoning into a Lightweight Vision-Language Model with CoThttp://arxiv.org/abs/2605.09719v1http://arxiv.org/abs/2605.09719v1Alaa Asfour et al. — arxiv:2605.09719 — Question AnsweringSun, 10 May 2026 00:00:00 GMTQuestion AnsweringDeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agentshttp://arxiv.org/abs/2605.09679v1http://arxiv.org/abs/2605.09679v1Yixiong Chen et al. — arxiv:2605.09679 — Question AnsweringSun, 10 May 2026 00:00:00 GMTQuestion AnsweringFinMoji: A Framework for Emoji-driven Sentiment Analysis in Financial Social Mediahttp://arxiv.org/abs/2605.09469v1http://arxiv.org/abs/2605.09469v1Ahmed Mahrous et al. — arxiv:2605.09469 — Sentiment AnalysisSun, 10 May 2026 00:00:00 GMTSentiment AnalysisOracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoninghttp://arxiv.org/abs/2605.09822v1http://arxiv.org/abs/2605.09822v1Ben Kereopa-Yorke et al. — arxiv:2605.09822 — Knowledge GraphSun, 10 May 2026 00:00:00 GMTKnowledge GraphK12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMshttp://arxiv.org/abs/2605.09635v1http://arxiv.org/abs/2605.09635v1Hao Liang et al. — arxiv:2605.09635 — Knowledge GraphSun, 10 May 2026 00:00:00 GMTKnowledge GraphLLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug-Disease Pairshttp://arxiv.org/abs/2605.09542v1http://arxiv.org/abs/2605.09542v1Rishabh Jakhar et al. — arxiv:2605.09542 — Knowledge GraphSun, 10 May 2026 00:00:00 GMTKnowledge GraphEpiGraph: A Knowledge Graph and Benchmark for Evidence-Intensive Reasoning in Epilepsyhttp://arxiv.org/abs/2605.09505v1http://arxiv.org/abs/2605.09505v1Yuyang Dai et al. — arxiv:2605.09505 — Knowledge GraphSun, 10 May 2026 00:00:00 GMTKnowledge GraphHOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activitieshttp://arxiv.org/abs/2605.09348v1http://arxiv.org/abs/2605.09348v1Shusaku Egami et al. — arxiv:2605.09348 — Knowledge GraphSun, 10 May 2026 00:00:00 GMTKnowledge GraphFrom Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languageshttp://arxiv.org/abs/2605.09147v1http://arxiv.org/abs/2605.09147v1Matthias Schöffel et al. — arxiv:2605.09147 — NLPSat, 09 May 2026 00:00:00 GMTNLPThe Art of the Jailbreak: Formulating Jailbreak Attacks for LLM Security Beyond Binary Scoringhttp://arxiv.org/abs/2605.09225v1http://arxiv.org/abs/2605.09225v1Ismail Hossain et al. — arxiv:2605.09225 — AlignmentSat, 09 May 2026 00:00:00 GMTAlignmentLearning the Preferences of a Learning Agenthttp://arxiv.org/abs/2605.09217v1http://arxiv.org/abs/2605.09217v1Karim Abdel Sadek et al. — arxiv:2605.09217 — AlignmentSat, 09 May 2026 00:00:00 GMTAlignmentThe Grounding Gap: How LLMs Anchor the Meaning of Abstract Concepts Differently from Humanshttp://arxiv.org/abs/2605.08837v1http://arxiv.org/abs/2605.08837v1Odysseas S. Chlapanis et al. — arxiv:2605.08837 — AlignmentSat, 09 May 2026 00:00:00 GMTAlignmentCompressed Video Aggregator: Content-driven Module for Efficient Micro-Video Recommendationhttp://arxiv.org/abs/2605.08810v1http://arxiv.org/abs/2605.08810v1Yang Xiao et al. — arxiv:2605.08810 — AlignmentSat, 09 May 2026 00:00:00 GMTAlignmentThe Art of the Jailbreak: Formulating Jailbreak Attacks for LLM Security Beyond Binary Scoringhttp://arxiv.org/abs/2605.09225v1http://arxiv.org/abs/2605.09225v1Ismail Hossain et al. — arxiv:2605.09225 — LLM SafetySat, 09 May 2026 00:00:00 GMTLLM SafetyAutoRedTrader: Autonomous Red Teaming of Trading Agents through Synthetic Misinformation Injectionhttp://arxiv.org/abs/2605.09185v1http://arxiv.org/abs/2605.09185v1Zhiwei Liu et al. — arxiv:2605.09185 — LLM SafetySat, 09 May 2026 00:00:00 GMTLLM SafetyEvaluating LLM-Generated Code: A Benchmark and Developer Studyhttp://arxiv.org/abs/2605.09059v1http://arxiv.org/abs/2605.09059v1Joanna Szych et al. — arxiv:2605.09059 — Code LLMSat, 09 May 2026 00:00:00 GMTCode LLMUsing Semantic Distance to Estimate Uncertainty in LLM-Based Code Generationhttp://arxiv.org/abs/2605.09023v1http://arxiv.org/abs/2605.09023v1Weilin He et al. — arxiv:2605.09023 — Code LLMSat, 09 May 2026 00:00:00 GMTCode LLMMDGYM: Benchmarking AI Agents on Molecular Simulationshttp://arxiv.org/abs/2605.08941v1http://arxiv.org/abs/2605.08941v1Vinay Kumar et al. — arxiv:2605.08941 — Code LLMSat, 09 May 2026 00:00:00 GMTCode LLMPrepBench: How Far Are We from Natural-Language-Driven Data Preparation?http://arxiv.org/abs/2605.08687v1http://arxiv.org/abs/2605.08687v1Jingzhe Xu et al. — arxiv:2605.08687 — Code LLMSat, 09 May 2026 00:00:00 GMTCode LLMFrom Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languageshttp://arxiv.org/abs/2605.09147v1http://arxiv.org/abs/2605.09147v1Matthias Schöffel et al. — arxiv:2605.09147 — Multilingual NLPSat, 09 May 2026 00:00:00 GMTMultilingual NLPLanguage-Conditioned Visual Grounding with CLIP Multilingualhttp://arxiv.org/abs/2605.09060v1http://arxiv.org/abs/2605.09060v1J. de Curtò et al. — arxiv:2605.09060 — Multilingual NLPSat, 09 May 2026 00:00:00 GMTMultilingual NLPImproving Lexical Difficulty Prediction with Context-Aligned Contrastive Learning and Ridge Ensemblinghttp://arxiv.org/abs/2605.08950v1http://arxiv.org/abs/2605.08950v1Wicaksono Leksono Muhamad et al. — arxiv:2605.08950 — Multilingual NLPSat, 09 May 2026 00:00:00 GMTMultilingual NLPWhen More Parameters Hurt: Foundation Model Priors Amplify Worst-Client Disparity Under Extreme Federated Heterogeneityhttp://arxiv.org/abs/2605.08992v1http://arxiv.org/abs/2605.08992v1Kiran Naseer et al. — arxiv:2605.08992 — Text ClassificationSat, 09 May 2026 00:00:00 GMTText ClassificationTraining with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoninghttp://arxiv.org/abs/2605.08741v1http://arxiv.org/abs/2605.08741v1Zhengyang Zhao et al. — arxiv:2605.08741 — Text ClassificationSat, 09 May 2026 00:00:00 GMTText ClassificationGuidance Is Not a Hyperparameter: Learning Dynamic Control in Diffusion Language Modelshttp://arxiv.org/abs/2605.07701v1http://arxiv.org/abs/2605.07701v1Fan Zhou et al. — arxiv:2605.07701 — NLPFri, 08 May 2026 00:00:00 GMTNLPNürnberg NLP at PsyDefDetect: Multi-Axis Voter Ensembles for Psychological Defence Mechanism Classificationhttp://arxiv.org/abs/2605.07606v1http://arxiv.org/abs/2605.07606v1Philipp Steigerwald et al. — arxiv:2605.07606 — NLPFri, 08 May 2026 00:00:00 GMTNLPData Contamination in Neural Hieroglyphic Translation: A Reproducibility Studyhttp://arxiv.org/abs/2605.07453v1http://arxiv.org/abs/2605.07453v1Ammar Toutou et al. — arxiv:2605.07453 — NLPFri, 08 May 2026 00:00:00 GMTNLPSSP-based construction of evaluation-annotated data for fine-grained aspect-based sentiment analysishttp://arxiv.org/abs/2605.07446v1http://arxiv.org/abs/2605.07446v1Suwon Choi et al. — arxiv:2605.07446 — NLPFri, 08 May 2026 00:00:00 GMTNLPThe Proxy Presumption: From Semantic Embeddings to Valid Social Measureshttp://arxiv.org/abs/2605.07409v1http://arxiv.org/abs/2605.07409v1Baishi Li et al. — arxiv:2605.07409 — NLPFri, 08 May 2026 00:00:00 GMTNLPZero-Shot Neural Network Evaluation with Sample-Wise Activation Patternshttp://arxiv.org/abs/2605.07378v1http://arxiv.org/abs/2605.07378v1Yameng Peng et al. — arxiv:2605.07378 — NLPFri, 08 May 2026 00:00:00 GMTNLPLLMs Improving LLMs: Agentic Discovery for Test-Time Scalinghttp://arxiv.org/abs/2605.08083v1http://arxiv.org/abs/2605.08083v1Tong Zheng et al. — arxiv:2605.08083 — LLMFri, 08 May 2026 00:00:00 GMTLLMVecCISC: Improving Confidence-Informed Self-Consistency with Reasoning Trace Clustering and Candidate Answer Selectionhttp://arxiv.org/abs/2605.08070v1http://arxiv.org/abs/2605.08070v1James Petullo et al. — arxiv:2605.08070 — LLMFri, 08 May 2026 00:00:00 GMTLLMEmpirical Bayes Rebiasinghttp://arxiv.org/abs/2605.08069v1http://arxiv.org/abs/2605.08069v1Wanyi Ling et al. — arxiv:2605.08069 — LLMFri, 08 May 2026 00:00:00 GMTLLMFlow-OPD: On-Policy Distillation for Flow Matching Modelshttp://arxiv.org/abs/2605.08063v1http://arxiv.org/abs/2605.08063v1Zhen Fang et al. — arxiv:2605.08063 — LLMFri, 08 May 2026 00:00:00 GMTLLMRubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoninghttp://arxiv.org/abs/2605.08061v1http://arxiv.org/abs/2605.08061v1Manish Bhattarai et al. — arxiv:2605.08061 — LLMFri, 08 May 2026 00:00:00 GMTLLMThe Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agentshttp://arxiv.org/abs/2605.08060v1http://arxiv.org/abs/2605.08060v1Jiayuan Liu et al. — arxiv:2605.08060 — LLMFri, 08 May 2026 00:00:00 GMTLLMCA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocationhttp://arxiv.org/abs/2605.08057v1http://arxiv.org/abs/2605.08057v1James Petullo et al. — arxiv:2605.08057 — LLMFri, 08 May 2026 00:00:00 GMTLLMTowards Highly-Constrained Human Motion Generation with Retrieval-Guided Diffusion Noise Optimizationhttp://arxiv.org/abs/2605.08054v1http://arxiv.org/abs/2605.08054v1Hanchao Liu et al. — arxiv:2605.08054 — LLMFri, 08 May 2026 00:00:00 GMTLLMUncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMshttp://arxiv.org/abs/2605.08045v1http://arxiv.org/abs/2605.08045v1Yi Yu et al. — arxiv:2605.08045 — LLMFri, 08 May 2026 00:00:00 GMTLLMECNUClaw: A Learner-Profiled Intelligent Study Companion Framework for K-12 Personalized Educationhttp://arxiv.org/abs/2605.08040v1http://arxiv.org/abs/2605.08040v1Yizhou Zhou et al. — arxiv:2605.08040 — LLMFri, 08 May 2026 00:00:00 GMTLLMLLMs Improving LLMs: Agentic Discovery for Test-Time Scalinghttp://arxiv.org/abs/2605.08083v1http://arxiv.org/abs/2605.08083v1Tong Zheng et al. — arxiv:2605.08083 — LLM AgentFri, 08 May 2026 00:00:00 GMTLLM AgentThe Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agentshttp://arxiv.org/abs/2605.08060v1http://arxiv.org/abs/2605.08060v1Jiayuan Liu et al. — arxiv:2605.08060 — LLM AgentFri, 08 May 2026 00:00:00 GMTLLM AgentTowards Highly-Constrained Human Motion Generation with Retrieval-Guided Diffusion Noise Optimizationhttp://arxiv.org/abs/2605.08054v1http://arxiv.org/abs/2605.08054v1Hanchao Liu et al. — arxiv:2605.08054 — LLM AgentFri, 08 May 2026 00:00:00 GMTLLM AgentReason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learnershttp://arxiv.org/abs/2605.08019v1http://arxiv.org/abs/2605.08019v1Botos Csaba et al. — arxiv:2605.08019 — LLM AgentFri, 08 May 2026 00:00:00 GMTLLM AgentCollaborator or Assistnat? How AI Coding Agents Partition Work Across Pull Request Lifecycleshttp://arxiv.org/abs/2605.08017v1http://arxiv.org/abs/2605.08017v1Young Jo et al. — arxiv:2605.08017 — LLM AgentFri, 08 May 2026 00:00:00 GMTLLM AgentLearning CLI Agents with Structured Action Credit under Selective Observationhttp://arxiv.org/abs/2605.08013v1http://arxiv.org/abs/2605.08013v1Haoyang Su et al. — arxiv:2605.08013 — LLM AgentFri, 08 May 2026 00:00:00 GMTLLM AgentInterpreting Reinforcement Learning Agents with Susceptibilitieshttp://arxiv.org/abs/2605.08007v1http://arxiv.org/abs/2605.08007v1Chris Elliott et al. — arxiv:2605.08007 — LLM AgentFri, 08 May 2026 00:00:00 GMTLLM AgentTool Calling is Linearly Readable and Steerable in Language Modelshttp://arxiv.org/abs/2605.07990v1http://arxiv.org/abs/2605.07990v1Zekun Wu et al. — arxiv:2605.07990 — LLM AgentFri, 08 May 2026 00:00:00 GMTLLM AgentGraph Representation Learning Augmented Model Manipulation on Federated Fine-Tuning of LLMshttp://arxiv.org/abs/2605.07961v1http://arxiv.org/abs/2605.07961v1Hanlin Cai et al. — arxiv:2605.07961 — LLM AgentFri, 08 May 2026 00:00:00 GMTLLM AgentExploring a Virtual Pet to Provide Context Notifications in a Tourism Recommender System: a Pilot Studyhttp://arxiv.org/abs/2605.07960v1http://arxiv.org/abs/2605.07960v1Patrícia Alves et al. — arxiv:2605.07960 — LLM AgentFri, 08 May 2026 00:00:00 GMTLLM AgentThe Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agentshttp://arxiv.org/abs/2605.08060v1http://arxiv.org/abs/2605.08060v1Jiayuan Liu et al. — arxiv:2605.08060 — Multi-AgentFri, 08 May 2026 00:00:00 GMTMulti-AgentReason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learnershttp://arxiv.org/abs/2605.08019v1http://arxiv.org/abs/2605.08019v1Botos Csaba et al. — arxiv:2605.08019 — Multi-AgentFri, 08 May 2026 00:00:00 GMTMulti-AgentLearning CLI Agents with Structured Action Credit under Selective Observationhttp://arxiv.org/abs/2605.08013v1http://arxiv.org/abs/2605.08013v1Haoyang Su et al. — arxiv:2605.08013 — Multi-AgentFri, 08 May 2026 00:00:00 GMTMulti-AgentTool Calling is Linearly Readable and Steerable in Language Modelshttp://arxiv.org/abs/2605.07990v1http://arxiv.org/abs/2605.07990v1Zekun Wu et al. — arxiv:2605.07990 — Multi-AgentFri, 08 May 2026 00:00:00 GMTMulti-AgentExploring a Virtual Pet to Provide Context Notifications in a Tourism Recommender System: a Pilot Studyhttp://arxiv.org/abs/2605.07960v1http://arxiv.org/abs/2605.07960v1Patrícia Alves et al. — arxiv:2605.07960 — Multi-AgentFri, 08 May 2026 00:00:00 GMTMulti-AgentTraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexampleshttp://arxiv.org/abs/2605.07935v1http://arxiv.org/abs/2605.07935v1Shuren Xia et al. — arxiv:2605.07935 — Multi-AgentFri, 08 May 2026 00:00:00 GMTMulti-AgentMany-to-Many Multi-Agent Pickup and Deliveryhttp://arxiv.org/abs/2605.07835v1http://arxiv.org/abs/2605.07835v1Ethan Schneider et al. — arxiv:2605.07835 — Multi-AgentFri, 08 May 2026 00:00:00 GMTMulti-AgentSCENE: Recognizing Social Norms and Sanctioning in Group Chatshttp://arxiv.org/abs/2605.07823v1http://arxiv.org/abs/2605.07823v1Mateusz Jacniacki et al. — arxiv:2605.07823 — Multi-AgentFri, 08 May 2026 00:00:00 GMTMulti-AgentIs a team only as strong as its weakest link? Quantifying the short-board effect with AI Agentshttp://arxiv.org/abs/2605.07773v1http://arxiv.org/abs/2605.07773v1Xin Xu et al. — arxiv:2605.07773 — Multi-AgentFri, 08 May 2026 00:00:00 GMTMulti-AgentAlternating Target-Path Planning for Scalable Multi-Agent Coordinationhttp://arxiv.org/abs/2605.07744v1http://arxiv.org/abs/2605.07744v1Yu Kumagai et al. — arxiv:2605.07744 — Multi-AgentFri, 08 May 2026 00:00:00 GMTMulti-AgentFAVOR: Efficient Filter-Agnostic Vector ANNS Based on Selectivity-Aware Exclusion Distanceshttp://arxiv.org/abs/2605.07770v1http://arxiv.org/abs/2605.07770v1Junjie Song et al. — arxiv:2605.07770 — RAGFri, 08 May 2026 00:00:00 GMTRAGCharacterizing and Mitigating False-Positive Bug Reports in the Linux Kernelhttp://arxiv.org/abs/2605.07678v1http://arxiv.org/abs/2605.07678v1Jiashuo Tian et al. — arxiv:2605.07678 — RAGFri, 08 May 2026 00:00:00 GMTRAGIntent-Driven Semantic ID Generation for Grounded Conversational News Recommendationhttp://arxiv.org/abs/2605.07613v1http://arxiv.org/abs/2605.07613v1Hongyang Su et al. — arxiv:2605.07613 — RAGFri, 08 May 2026 00:00:00 GMTRAGLARAG: Link-Aware Retrieval Strategy for RAG Systems in Hyperlinked Technical Documentationhttp://arxiv.org/abs/2605.07517v1http://arxiv.org/abs/2605.07517v1Giorgia Bolognesi et al. — arxiv:2605.07517 — RAGFri, 08 May 2026 00:00:00 GMTRAGCSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representationshttp://arxiv.org/abs/2605.07325v1http://arxiv.org/abs/2605.07325v1Robin Karlsson et al. — arxiv:2605.07325 — RAGFri, 08 May 2026 00:00:00 GMTRAGBioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulationhttp://arxiv.org/abs/2605.07306v1http://arxiv.org/abs/2605.07306v1Zhaohui Du et al. — arxiv:2605.07306 — RAGFri, 08 May 2026 00:00:00 GMTRAGFrom Clouds to Hallucinations: Atmospheric Retrieval Hijacking in Remote Sensing Vision-Language RAGhttp://arxiv.org/abs/2605.07273v1http://arxiv.org/abs/2605.07273v1Jiaju Han et al. — arxiv:2605.07273 — RAGFri, 08 May 2026 00:00:00 GMTRAGMLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocalhttp://arxiv.org/abs/2605.07249v1http://arxiv.org/abs/2605.07249v1Youngjoon Jang et al. — arxiv:2605.07249 — RAGFri, 08 May 2026 00:00:00 GMTRAGTopic Is Not Agenda: A Citation-Community Audit of Text Embeddingshttp://arxiv.org/abs/2605.07158v1http://arxiv.org/abs/2605.07158v1Junseon Yoo et al. — arxiv:2605.07158 — RAGFri, 08 May 2026 00:00:00 GMTRAGFrom Standard English to Singlish: A Retrieval-Augmented Approach for Code-Switched Creole Generation in Large Language Modelshttp://arxiv.org/abs/2605.07132v1http://arxiv.org/abs/2605.07132v1Foong Ming Lai et al. — arxiv:2605.07132 — RAGFri, 08 May 2026 00:00:00 GMTRAGThe Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agentshttp://arxiv.org/abs/2605.08060v1http://arxiv.org/abs/2605.08060v1Jiayuan Liu et al. — arxiv:2605.08060 — ReasoningFri, 08 May 2026 00:00:00 GMTReasoningCA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocationhttp://arxiv.org/abs/2605.08057v1http://arxiv.org/abs/2605.08057v1James Petullo et al. — arxiv:2605.08057 — ReasoningFri, 08 May 2026 00:00:00 GMTReasoningReason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learnershttp://arxiv.org/abs/2605.08019v1http://arxiv.org/abs/2605.08019v1Botos Csaba et al. — arxiv:2605.08019 — ReasoningFri, 08 May 2026 00:00:00 GMTReasoningAbductive Reasoning with Probabilistic Commonsensehttp://arxiv.org/abs/2605.08011v1http://arxiv.org/abs/2605.08011v1Joseph Cotnareanu et al. — arxiv:2605.08011 — ReasoningFri, 08 May 2026 00:00:00 GMTReasoningSimilar Pattern Annotation via Retrieval Knowledge for LLM-Based Test Code Fault Localizationhttp://arxiv.org/abs/2605.07957v1http://arxiv.org/abs/2605.07957v1Golnaz Gharachorlu et al. — arxiv:2605.07957 — ReasoningFri, 08 May 2026 00:00:00 GMTReasoningCoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewershttp://arxiv.org/abs/2605.07905v1http://arxiv.org/abs/2605.07905v1Hexuan Deng et al. — arxiv:2605.07905 — ReasoningFri, 08 May 2026 00:00:00 GMTReasoningMelding LLM and temporal logic for reliable human-swarm collaboration in complex scenarioshttp://arxiv.org/abs/2605.07877v1http://arxiv.org/abs/2605.07877v1Junfeng Chen et al. — arxiv:2605.07877 — ReasoningFri, 08 May 2026 00:00:00 GMTReasoningVideo Understanding Reward Modeling: A Robust Benchmark and Performant Reward Modelshttp://arxiv.org/abs/2605.07872v1http://arxiv.org/abs/2605.07872v1Yuancheng Wei et al. — arxiv:2605.07872 — ReasoningFri, 08 May 2026 00:00:00 GMTReasoningPrune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoninghttp://arxiv.org/abs/2605.07804v1http://arxiv.org/abs/2605.07804v1Zhicheng Yang et al. — arxiv:2605.07804 — ReasoningFri, 08 May 2026 00:00:00 GMTReasoningTracing Uncertainty in Language Model "Reasoning"http://arxiv.org/abs/2605.07776v1http://arxiv.org/abs/2605.07776v1Nils Grünefeld et al. — arxiv:2605.07776 — ReasoningFri, 08 May 2026 00:00:00 GMTReasoningAgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agentshttp://arxiv.org/abs/2605.07926v1http://arxiv.org/abs/2605.07926v1Zhengkang Guo et al. — arxiv:2605.07926 — Tool UseFri, 08 May 2026 00:00:00 GMTTool UseSARC: A Governance-by-Architecture Framework for Agentic AI Systemshttp://arxiv.org/abs/2605.07728v1http://arxiv.org/abs/2605.07728v1Gaston Besanson et al. — arxiv:2605.07728 — Tool UseFri, 08 May 2026 00:00:00 GMTTool UseInterLV-Search: Benchmarking Interleaved Multimodal Agentic Searchhttp://arxiv.org/abs/2605.07510v1http://arxiv.org/abs/2605.07510v1Bohan Hou et al. — arxiv:2605.07510 — Tool UseFri, 08 May 2026 00:00:00 GMTTool UseFlightSense: An End-to-End MLOps Platform for Real-Time Flight Delay Prediction via Rotation-Chain Propagation Features and Agentic Conversational AIhttp://arxiv.org/abs/2605.07364v1http://arxiv.org/abs/2605.07364v1Aditi J. Shelke et al. — arxiv:2605.07364 — Tool UseFri, 08 May 2026 00:00:00 GMTTool UseSignal Reshaping for GRPO in Weak-Feedback Agentic Code Repairhttp://arxiv.org/abs/2605.07276v1http://arxiv.org/abs/2605.07276v1Jia Li et al. — arxiv:2605.07276 — Tool UseFri, 08 May 2026 00:00:00 GMTTool UseMIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learninghttp://arxiv.org/abs/2605.07269v1http://arxiv.org/abs/2605.07269v1Al Muhit Muhtadi et al. — arxiv:2605.07269 — Tool UseFri, 08 May 2026 00:00:00 GMTTool UseCan Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoninghttp://arxiv.org/abs/2605.07251v1http://arxiv.org/abs/2605.07251v1Yuyang Wu et al. — arxiv:2605.07251 — Tool UseFri, 08 May 2026 00:00:00 GMTTool UseHyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agentshttp://arxiv.org/abs/2605.07177v1http://arxiv.org/abs/2605.07177v1Guankai Li et al. — arxiv:2605.07177 — Tool UseFri, 08 May 2026 00:00:00 GMTTool UseSwitchcraft: AI Model Router for Agentic Tool Callinghttp://arxiv.org/abs/2605.07112v1http://arxiv.org/abs/2605.07112v1Sharad Agarwal et al. — arxiv:2605.07112 — Tool UseFri, 08 May 2026 00:00:00 GMTTool UseProxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignmenthttp://arxiv.org/abs/2605.08064v1http://arxiv.org/abs/2605.08064v1Jerry Jiang et al. — arxiv:2605.08064 — Multimodal LLMFri, 08 May 2026 00:00:00 GMTMultimodal LLMObject Hallucination-Free Reinforcement Unlearning for Vision-Language Modelshttp://arxiv.org/abs/2605.08031v1http://arxiv.org/abs/2605.08031v1Kaidi Jia et al. — arxiv:2605.08031 — Multimodal LLMFri, 08 May 2026 00:00:00 GMTMultimodal LLMSTARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generationhttp://arxiv.org/abs/2605.08029v1http://arxiv.org/abs/2605.08029v1Ying Shen et al. — arxiv:2605.08029 — Multimodal LLMFri, 08 May 2026 00:00:00 GMTMultimodal LLMSphereVAD: Training-Free Video Anomaly Detection via Geodesic Inference on the Unit Hyperspherehttp://arxiv.org/abs/2605.08003v1http://arxiv.org/abs/2605.08003v1Chao Huang et al. — arxiv:2605.08003 — Multimodal LLMFri, 08 May 2026 00:00:00 GMTMultimodal LLMMedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidencehttp://arxiv.org/abs/2605.07919v1http://arxiv.org/abs/2605.07919v1Hanqi Jiang et al. — arxiv:2605.07919 — Multimodal LLMFri, 08 May 2026 00:00:00 GMTMultimodal LLMAnisotropic Modality Alignhttp://arxiv.org/abs/2605.07825v1http://arxiv.org/abs/2605.07825v1Xiaomin Yu et al. — arxiv:2605.07825 — Multimodal LLMFri, 08 May 2026 00:00:00 GMTMultimodal LLMGazeVLM: Active Vision via Internal Attention Control for Multimodal Reasoninghttp://arxiv.org/abs/2605.07817v1http://arxiv.org/abs/2605.07817v1Brown Ebouky et al. — arxiv:2605.07817 — Multimodal LLMFri, 08 May 2026 00:00:00 GMTMultimodal LLMSARA: Semantically Adaptive Relational Alignment for Video Diffusion Modelshttp://arxiv.org/abs/2605.07800v1http://arxiv.org/abs/2605.07800v1Jiesong Lian et al. — arxiv:2605.07800 — Multimodal LLMFri, 08 May 2026 00:00:00 GMTMultimodal LLMRuleSafe-VL: Evaluating Rule-Conditioned Decision Reasoning in Vision-Language Content Moderationhttp://arxiv.org/abs/2605.07760v1http://arxiv.org/abs/2605.07760v1Zhifeng Lu et al. — arxiv:2605.07760 — Multimodal LLMFri, 08 May 2026 00:00:00 GMTMultimodal LLMOperating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Modelshttp://arxiv.org/abs/2605.07649v1http://arxiv.org/abs/2605.07649v1Berkehan Ünal et al. — arxiv:2605.07649 — Multimodal LLMFri, 08 May 2026 00:00:00 GMTMultimodal LLMLearning CLI Agents with Structured Action Credit under Selective Observationhttp://arxiv.org/abs/2605.08013v1http://arxiv.org/abs/2605.08013v1Haoyang Su et al. — arxiv:2605.08013 — Long ContextFri, 08 May 2026 00:00:00 GMTLong ContextAsk Early, Ask Late, Ask Right: When Does Clarification Timing Matter for Long-Horizon Agents?http://arxiv.org/abs/2605.07937v1http://arxiv.org/abs/2605.07937v1Anmol Gulati et al. — arxiv:2605.07937 — Long ContextFri, 08 May 2026 00:00:00 GMTLong ContextWhat if AI systems weren't chatbots?http://arxiv.org/abs/2605.07896v1http://arxiv.org/abs/2605.07896v1Sourojit Ghosh et al. — arxiv:2605.07896 — Long ContextFri, 08 May 2026 00:00:00 GMTLong ContextMelding LLM and temporal logic for reliable human-swarm collaboration in complex scenarioshttp://arxiv.org/abs/2605.07877v1http://arxiv.org/abs/2605.07877v1Junfeng Chen et al. — arxiv:2605.07877 — Long ContextFri, 08 May 2026 00:00:00 GMTLong ContextPrune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoninghttp://arxiv.org/abs/2605.07804v1http://arxiv.org/abs/2605.07804v1Zhicheng Yang et al. — arxiv:2605.07804 — Long ContextFri, 08 May 2026 00:00:00 GMTLong ContextAn Efficient Hybrid Sparse Attention with CPU-GPU Parallelism for Long-Context Inferencehttp://arxiv.org/abs/2605.07719v1http://arxiv.org/abs/2605.07719v1Feiyu Yao et al. — arxiv:2605.07719 — Long ContextFri, 08 May 2026 00:00:00 GMTLong ContextSpatiotemporal Trust Evaluation for Collaborator Selection via Customized GNN-Mambahttp://arxiv.org/abs/2605.07658v1http://arxiv.org/abs/2605.07658v1Botao Zhu et al. — arxiv:2605.07658 — Long ContextFri, 08 May 2026 00:00:00 GMTLong ContextHexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardwarehttp://arxiv.org/abs/2605.07569v1http://arxiv.org/abs/2605.07569v1Yan Liang et al. — arxiv:2605.07569 — Long ContextFri, 08 May 2026 00:00:00 GMTLong ContextEditTransfer++: Toward Faithful and Efficient Visual-Prompt-Guided Image Editinghttp://arxiv.org/abs/2605.07455v1http://arxiv.org/abs/2605.07455v1Lan Chen et al. — arxiv:2605.07455 — Long ContextFri, 08 May 2026 00:00:00 GMTLong ContextRcLLM: Accelerating Generative Recommendation via Beyond-Prefix KV Cachinghttp://arxiv.org/abs/2605.07443v1http://arxiv.org/abs/2605.07443v1Zhan Zhao et al. — arxiv:2605.07443 — Long ContextFri, 08 May 2026 00:00:00 GMTLong ContextThe Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agentshttp://arxiv.org/abs/2605.08060v1http://arxiv.org/abs/2605.08060v1Jiayuan Liu et al. — arxiv:2605.08060 — LLM EfficiencyFri, 08 May 2026 00:00:00 GMTLLM EfficiencyConvergent Stochastic Training of Attention and Understanding LoRAhttp://arxiv.org/abs/2605.07959v1http://arxiv.org/abs/2605.07959v1Zhengkai Sun et al. — arxiv:2605.07959 — LLM EfficiencyFri, 08 May 2026 00:00:00 GMTLLM EfficiencyEnergy-Resolved Quantum Geometry from Středa Response: Driven-Dissipative Bosonic Lattices and Disordered Systemshttp://arxiv.org/abs/2605.07948v1http://arxiv.org/abs/2605.07948v1Anaïs Defossez et al. — arxiv:2605.07948 — LLM EfficiencyFri, 08 May 2026 00:00:00 GMTLLM EfficiencyA Fully Tunable Ultra-Low Power Current-Mode Memory Cell in Standard CMOS Technologyhttp://arxiv.org/abs/2605.07936v1http://arxiv.org/abs/2605.07936v1Arthur Fyon et al. — arxiv:2605.07936 — LLM EfficiencyFri, 08 May 2026 00:00:00 GMTLLM EfficiencyOne Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policyhttp://arxiv.org/abs/2605.07931v1http://arxiv.org/abs/2605.07931v1Zuojin Tang et al. — arxiv:2605.07931 — LLM EfficiencyFri, 08 May 2026 00:00:00 GMTLLM EfficiencyBeeVe: Unsupervised Acoustic State Discovery in Honey Bee Buzzinghttp://arxiv.org/abs/2605.07903v1http://arxiv.org/abs/2605.07903v1Hamze Hammami et al. — arxiv:2605.07903 — LLM EfficiencyFri, 08 May 2026 00:00:00 GMTLLM EfficiencyBulk-mediated reflection of chirality-protected surface spin waveshttp://arxiv.org/abs/2605.07875v1http://arxiv.org/abs/2605.07875v1Vitaliy I. Vasyuchka et al. — arxiv:2605.07875 — LLM EfficiencyFri, 08 May 2026 00:00:00 GMTLLM EfficiencyMatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuninghttp://arxiv.org/abs/2605.07850v1http://arxiv.org/abs/2605.07850v1Ionut-Vlad Modoranu et al. — arxiv:2605.07850 — LLM EfficiencyFri, 08 May 2026 00:00:00 GMTLLM EfficiencyMeasuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviorshttp://arxiv.org/abs/2605.07847v1http://arxiv.org/abs/2605.07847v1Shuhaib Mehri et al. — arxiv:2605.07847 — LLM EfficiencyFri, 08 May 2026 00:00:00 GMTLLM EfficiencyApproximation-Free Differentiable Oblique Decision Treeshttp://arxiv.org/abs/2605.07837v1http://arxiv.org/abs/2605.07837v1Subrat Prasad Panda et al. — arxiv:2605.07837 — LLM EfficiencyFri, 08 May 2026 00:00:00 GMTLLM EfficiencyBeyond Pairs: Your Language Model is Secretly Optimizing a Preference Graphhttp://arxiv.org/abs/2605.08037v1http://arxiv.org/abs/2605.08037v1Ning Liu et al. — arxiv:2605.08037 — AlignmentFri, 08 May 2026 00:00:00 GMTAlignmentInterpreting Reinforcement Learning Agents with Susceptibilitieshttp://arxiv.org/abs/2605.08007v1http://arxiv.org/abs/2605.08007v1Chris Elliott et al. — arxiv:2605.08007 — AlignmentFri, 08 May 2026 00:00:00 GMTAlignmentDiffusion-APO: Trajectory-Aware Direct Preference Alignment for Video Diffusion Transformershttp://arxiv.org/abs/2605.07503v1http://arxiv.org/abs/2605.07503v1Jingyuan Zhu et al. — arxiv:2605.07503 — AlignmentFri, 08 May 2026 00:00:00 GMTAlignmentTopology-Enhanced Alignment for Large Language Models: Trajectory Topology Loss and Topological Preference Optimizationhttp://arxiv.org/abs/2605.07172v1http://arxiv.org/abs/2605.07172v1Yurui Pan et al. — arxiv:2605.07172 — AlignmentFri, 08 May 2026 00:00:00 GMTAlignmentDr. Post-Training: A Data Regularization Perspective on LLM Post-Traininghttp://arxiv.org/abs/2605.07063v1http://arxiv.org/abs/2605.07063v1Pingbang Hu et al. — arxiv:2605.07063 — AlignmentFri, 08 May 2026 00:00:00 GMTAlignmentVecCISC: Improving Confidence-Informed Self-Consistency with Reasoning Trace Clustering and Candidate Answer Selectionhttp://arxiv.org/abs/2605.08070v1http://arxiv.org/abs/2605.08070v1James Petullo et al. — arxiv:2605.08070 — HallucinationFri, 08 May 2026 00:00:00 GMTHallucinationObject Hallucination-Free Reinforcement Unlearning for Vision-Language Modelshttp://arxiv.org/abs/2605.08031v1http://arxiv.org/abs/2605.08031v1Kaidi Jia et al. — arxiv:2605.08031 — HallucinationFri, 08 May 2026 00:00:00 GMTHallucinationPosition: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claimshttp://arxiv.org/abs/2605.08012v1http://arxiv.org/abs/2605.08012v1Zezheng Lin et al. — arxiv:2605.08012 — HallucinationFri, 08 May 2026 00:00:00 GMTHallucinationDelta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervisionhttp://arxiv.org/abs/2605.07940v1http://arxiv.org/abs/2605.07940v1Jiacheng Chen et al. — arxiv:2605.07940 — HallucinationFri, 08 May 2026 00:00:00 GMTHallucinationCoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewershttp://arxiv.org/abs/2605.07905v1http://arxiv.org/abs/2605.07905v1Hexuan Deng et al. — arxiv:2605.07905 — HallucinationFri, 08 May 2026 00:00:00 GMTHallucinationMeasuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviorshttp://arxiv.org/abs/2605.07847v1http://arxiv.org/abs/2605.07847v1Shuhaib Mehri et al. — arxiv:2605.07847 — HallucinationFri, 08 May 2026 00:00:00 GMTHallucinationGazeVLM: Active Vision via Internal Attention Control for Multimodal Reasoninghttp://arxiv.org/abs/2605.07817v1http://arxiv.org/abs/2605.07817v1Brown Ebouky et al. — arxiv:2605.07817 — HallucinationFri, 08 May 2026 00:00:00 GMTHallucinationBeam-Aware Radio Map Estimation With Physics-Consistent Parametric Modeling for Unknown Multiple Satelliteshttp://arxiv.org/abs/2605.07763v1http://arxiv.org/abs/2605.07763v1Xiucheng Wang et al. — arxiv:2605.07763 — HallucinationFri, 08 May 2026 00:00:00 GMTHallucinationSecuring the Dark Matter: A Semantic-Enhanced Neuro-Symbolic Framework for Supply Chain Analysis of Opaque Industrial Softwarehttp://arxiv.org/abs/2605.07737v1http://arxiv.org/abs/2605.07737v1Bowei Ning et al. — arxiv:2605.07737 — HallucinationFri, 08 May 2026 00:00:00 GMTHallucinationLLM hallucinations in the wild: Large-scale evidence from non-existent citationshttp://arxiv.org/abs/2605.07723v1http://arxiv.org/abs/2605.07723v1Zhenyue Zhao et al. — arxiv:2605.07723 — HallucinationFri, 08 May 2026 00:00:00 GMTHallucinationGLiGuard: Schema-Conditioned Classification for LLM Safeguardhttp://arxiv.org/abs/2605.07982v1http://arxiv.org/abs/2605.07982v1Urchade Zaratiana et al. — arxiv:2605.07982 — LLM SafetyFri, 08 May 2026 00:00:00 GMTLLM SafetyFortifying Time Series: DTW-Certified Robust Anomaly Detectionhttp://arxiv.org/abs/2605.07690v1http://arxiv.org/abs/2605.07690v1Shijie Liu et al. — arxiv:2605.07690 — LLM SafetyFri, 08 May 2026 00:00:00 GMTLLM SafetyBeyond Defenses: Manifold-Aligned Regularization for Intrinsic 3D Point Cloud Robustnesshttp://arxiv.org/abs/2605.07590v1http://arxiv.org/abs/2605.07590v1Pedro Alonso et al. — arxiv:2605.07590 — LLM SafetyFri, 08 May 2026 00:00:00 GMTLLM SafetyUncovering Hidden Systematics in Neural Network Models for High Energy Physicshttp://arxiv.org/abs/2605.07470v1http://arxiv.org/abs/2605.07470v1Lucie Flek et al. — arxiv:2605.07470 — LLM SafetyFri, 08 May 2026 00:00:00 GMTLLM SafetySparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMshttp://arxiv.org/abs/2605.07447v1http://arxiv.org/abs/2605.07447v1Hao Wang et al. — arxiv:2605.07447 — LLM SafetyFri, 08 May 2026 00:00:00 GMTLLM SafetyOrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzinghttp://arxiv.org/abs/2605.07414v1http://arxiv.org/abs/2605.07414v1Jianming Chen et al. — arxiv:2605.07414 — LLM SafetyFri, 08 May 2026 00:00:00 GMTLLM SafetyGPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimizationhttp://arxiv.org/abs/2605.07399v1http://arxiv.org/abs/2605.07399v1Yu Pan et al. — arxiv:2605.07399 — LLM SafetyFri, 08 May 2026 00:00:00 GMTLLM SafetyHard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignmenthttp://arxiv.org/abs/2605.07250v1http://arxiv.org/abs/2605.07250v1Zhixue Song et al. — arxiv:2605.07250 — LLM SafetyFri, 08 May 2026 00:00:00 GMTLLM SafetyRubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoninghttp://arxiv.org/abs/2605.08061v1http://arxiv.org/abs/2605.08061v1Manish Bhattarai et al. — arxiv:2605.08061 — LLM EvaluationFri, 08 May 2026 00:00:00 GMTLLM EvaluationAbductive Reasoning with Probabilistic Commonsensehttp://arxiv.org/abs/2605.08011v1http://arxiv.org/abs/2605.08011v1Joseph Cotnareanu et al. — arxiv:2605.08011 — LLM EvaluationFri, 08 May 2026 00:00:00 GMTLLM EvaluationAsymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Meanshttp://arxiv.org/abs/2605.07964v1http://arxiv.org/abs/2605.07964v1Valentin Kilian et al. — arxiv:2605.07964 — LLM EvaluationFri, 08 May 2026 00:00:00 GMTLLM EvaluationDRIP-R: A Benchmark for Decision-Making and Reasoning Under Real-World Policy Ambiguity in the Retail Domainhttp://arxiv.org/abs/2605.07699v1http://arxiv.org/abs/2605.07699v1Hsuvas Borkakoty et al. — arxiv:2605.07699 — LLM EvaluationFri, 08 May 2026 00:00:00 GMTLLM EvaluationFactoryBench: Evaluating Industrial Machine Understandinghttp://arxiv.org/abs/2605.07675v1http://arxiv.org/abs/2605.07675v1Yanis Merzouki et al. — arxiv:2605.07675 — LLM EvaluationFri, 08 May 2026 00:00:00 GMTLLM EvaluationMAVEN: Multi-Agent Verification-Elaboration Network with In-Step Epistemic Auditinghttp://arxiv.org/abs/2605.07646v1http://arxiv.org/abs/2605.07646v1Yinsheng Yao et al. — arxiv:2605.07646 — LLM EvaluationFri, 08 May 2026 00:00:00 GMTLLM EvaluationEfficient Data Selection for Multimodal Models via Incremental Optimization Utilityhttp://arxiv.org/abs/2605.07488v1http://arxiv.org/abs/2605.07488v1Jinhao Jing et al. — arxiv:2605.07488 — LLM EvaluationFri, 08 May 2026 00:00:00 GMTLLM EvaluationGameGen-Verifier: Parallel Keypoint-Based Verification for LLM-Generated Games via Runtime State Injectionhttp://arxiv.org/abs/2605.07442v1http://arxiv.org/abs/2605.07442v1Chaobo Jia et al. — arxiv:2605.07442 — LLM EvaluationFri, 08 May 2026 00:00:00 GMTLLM EvaluationUnsolvability Ceiling in Multi-LLM Routing: An Empirical Study of Evaluation Artifactshttp://arxiv.org/abs/2605.07395v1http://arxiv.org/abs/2605.07395v1Saloni Garg et al. — arxiv:2605.07395 — LLM EvaluationFri, 08 May 2026 00:00:00 GMTLLM EvaluationCan Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoninghttp://arxiv.org/abs/2605.07251v1http://arxiv.org/abs/2605.07251v1Yuyang Wu et al. — arxiv:2605.07251 — LLM EvaluationFri, 08 May 2026 00:00:00 GMTLLM EvaluationBeyond Pairs: Your Language Model is Secretly Optimizing a Preference Graphhttp://arxiv.org/abs/2605.08037v1http://arxiv.org/abs/2605.08037v1Ning Liu et al. — arxiv:2605.08037 — Code LLMFri, 08 May 2026 00:00:00 GMTCode LLMSimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillationhttp://arxiv.org/abs/2605.07711v1http://arxiv.org/abs/2605.07711v1Jie Sun et al. — arxiv:2605.07711 — Code LLMFri, 08 May 2026 00:00:00 GMTCode LLMCan LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generationhttp://arxiv.org/abs/2605.07525v1http://arxiv.org/abs/2605.07525v1Luciano Baresi et al. — arxiv:2605.07525 — Code LLMFri, 08 May 2026 00:00:00 GMTCode LLMGameGen-Verifier: Parallel Keypoint-Based Verification for LLM-Generated Games via Runtime State Injectionhttp://arxiv.org/abs/2605.07442v1http://arxiv.org/abs/2605.07442v1Chaobo Jia et al. — arxiv:2605.07442 — Code LLMFri, 08 May 2026 00:00:00 GMTCode LLMMean-Pooled Cosine Similarity is Not Length-Invariant: Theory and Cross-Domain Evidence for a Length-Invariant Alternativehttp://arxiv.org/abs/2605.07345v1http://arxiv.org/abs/2605.07345v1Sibayan Mitra et al. — arxiv:2605.07345 — Code LLMFri, 08 May 2026 00:00:00 GMTCode LLMMage: Multi-Axis Evaluation of LLM-Generated Executable Game Scenes Beyond Compile-Pass Ratehttp://arxiv.org/abs/2605.07342v1http://arxiv.org/abs/2605.07342v1Hugh Xuechen Liu et al. — arxiv:2605.07342 — Code LLMFri, 08 May 2026 00:00:00 GMTCode LLMPaT: Planning-after-Trial for Efficient Test-Time Code Generationhttp://arxiv.org/abs/2605.07248v1http://arxiv.org/abs/2605.07248v1Youngsik Yoon et al. — arxiv:2605.07248 — Code LLMFri, 08 May 2026 00:00:00 GMTCode LLMCoupling Models for One-Step Discrete Generationhttp://arxiv.org/abs/2605.07193v1http://arxiv.org/abs/2605.07193v1Fred Zhangzhi Peng et al. — arxiv:2605.07193 — Code LLMFri, 08 May 2026 00:00:00 GMTCode LLMRepoZero: Can LLMs Generate a Code Repository from Scratch?http://arxiv.org/abs/2605.07122v1http://arxiv.org/abs/2605.07122v1Zhaoxi Zhang et al. — arxiv:2605.07122 — Code LLMFri, 08 May 2026 00:00:00 GMTCode LLMMedAction: Towards Active Multi-turn Clinical Diagnostic LLMshttp://arxiv.org/abs/2605.07305v1http://arxiv.org/abs/2605.07305v1Hsin-Ling Hsu et al. — arxiv:2605.07305 — Medical NLPFri, 08 May 2026 00:00:00 GMTMedical NLPMedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environmentshttp://arxiv.org/abs/2605.07058v1http://arxiv.org/abs/2605.07058v1Yicheng Gao et al. — arxiv:2605.07058 — Medical NLPFri, 08 May 2026 00:00:00 GMTMedical NLPBoosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repairhttp://arxiv.org/abs/2605.07403v1http://arxiv.org/abs/2605.07403v1Xinyue Liang et al. — arxiv:2605.07403 — Multilingual NLPFri, 08 May 2026 00:00:00 GMTMultilingual NLPMean-Pooled Cosine Similarity is Not Length-Invariant: Theory and Cross-Domain Evidence for a Length-Invariant Alternativehttp://arxiv.org/abs/2605.07345v1http://arxiv.org/abs/2605.07345v1Sibayan Mitra et al. — arxiv:2605.07345 — Multilingual NLPFri, 08 May 2026 00:00:00 GMTMultilingual NLPMIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learninghttp://arxiv.org/abs/2605.07269v1http://arxiv.org/abs/2605.07269v1Al Muhit Muhtadi et al. — arxiv:2605.07269 — Multilingual NLPFri, 08 May 2026 00:00:00 GMTMultilingual NLPMLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocalhttp://arxiv.org/abs/2605.07249v1http://arxiv.org/abs/2605.07249v1Youngjoon Jang et al. — arxiv:2605.07249 — Multilingual NLPFri, 08 May 2026 00:00:00 GMTMultilingual NLPGRaSp: Automatic Example Optimization for In-Context Learning in Low-Data Taskshttp://arxiv.org/abs/2605.07454v1http://arxiv.org/abs/2605.07454v1Simen Bihaug-Frøyland et al. — arxiv:2605.07454 — Named Entity RecognitionFri, 08 May 2026 00:00:00 GMTNamed Entity RecognitionLearning CLI Agents with Structured Action Credit under Selective Observationhttp://arxiv.org/abs/2605.08013v1http://arxiv.org/abs/2605.08013v1Haoyang Su et al. — arxiv:2605.08013 — Information ExtractionFri, 08 May 2026 00:00:00 GMTInformation ExtractionTCMIIES: A Browser-Based LLM-Powered Intelligent Information Extraction System for Academic Literaturehttp://arxiv.org/abs/2605.07507v1http://arxiv.org/abs/2605.07507v1Hanqing Zhao et al. — arxiv:2605.07507 — Information ExtractionFri, 08 May 2026 00:00:00 GMTInformation ExtractionCapCLIP: A Vision-Language Representation Alignment Approach for Wireless Capsule Endoscopy Analysishttp://arxiv.org/abs/2605.08493v1http://arxiv.org/abs/2605.08493v1Haroon Wahab et al. — arxiv:2605.08493 — Text ClassificationFri, 08 May 2026 00:00:00 GMTText ClassificationConformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibrationhttp://arxiv.org/abs/2605.08077v1http://arxiv.org/abs/2605.08077v1Shuhang Lin et al. — arxiv:2605.08077 — Question AnsweringFri, 08 May 2026 00:00:00 GMTQuestion AnsweringProxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignmenthttp://arxiv.org/abs/2605.08064v1http://arxiv.org/abs/2605.08064v1Jerry Jiang et al. — arxiv:2605.08064 — Question AnsweringFri, 08 May 2026 00:00:00 GMTQuestion AnsweringHow Value Induction Reshapes LLM Behaviourhttp://arxiv.org/abs/2605.07925v1http://arxiv.org/abs/2605.07925v1Arnav Arora et al. — arxiv:2605.07925 — Question AnsweringFri, 08 May 2026 00:00:00 GMTQuestion AnsweringBeyond GSD-as-Token: Continuous Scale Conditioning for Remote Sensing VLMshttp://arxiv.org/abs/2605.07562v1http://arxiv.org/abs/2605.07562v1Song Zhang et al. — arxiv:2605.07562 — Question AnsweringFri, 08 May 2026 00:00:00 GMTQuestion AnsweringBalCapRL: A Balanced Framework for RL-Based MLLM Image Captioninghttp://arxiv.org/abs/2605.07394v1http://arxiv.org/abs/2605.07394v1Shaokai Ye et al. — arxiv:2605.07394 — Question AnsweringFri, 08 May 2026 00:00:00 GMTQuestion AnsweringMIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learninghttp://arxiv.org/abs/2605.07269v1http://arxiv.org/abs/2605.07269v1Al Muhit Muhtadi et al. — arxiv:2605.07269 — Question AnsweringFri, 08 May 2026 00:00:00 GMTQuestion AnsweringBeyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMshttp://arxiv.org/abs/2605.07153v1http://arxiv.org/abs/2605.07153v1Wanli Yang et al. — arxiv:2605.07153 — Question AnsweringFri, 08 May 2026 00:00:00 GMTQuestion AnsweringBeyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptationhttp://arxiv.org/abs/2605.07111v1http://arxiv.org/abs/2605.07111v1Haozhan Tang et al. — arxiv:2605.07111 — Question AnsweringFri, 08 May 2026 00:00:00 GMTQuestion AnsweringSelf-Consolidating Language Models: Continual Knowledge Incorporation from Contexthttp://arxiv.org/abs/2605.07076v1http://arxiv.org/abs/2605.07076v1Zekun Wang et al. — arxiv:2605.07076 — Question AnsweringFri, 08 May 2026 00:00:00 GMTQuestion AnsweringModelLens: Finding the Best for Your Task from Myriads of Modelshttp://arxiv.org/abs/2605.07075v1http://arxiv.org/abs/2605.07075v1Rui Cai et al. — arxiv:2605.07075 — Question AnsweringFri, 08 May 2026 00:00:00 GMTQuestion AnsweringHybrid TF--IDF Logistic Regression and MLP Neural Baseline for Indonesian Three-Class Sentiment Analysis on Social Media Texthttp://arxiv.org/abs/2605.07793v1http://arxiv.org/abs/2605.07793v1Allya Nurul Islami Pasha et al. — arxiv:2605.07793 — Sentiment AnalysisFri, 08 May 2026 00:00:00 GMTSentiment AnalysisSSP-based construction of evaluation-annotated data for fine-grained aspect-based sentiment analysishttp://arxiv.org/abs/2605.07446v1http://arxiv.org/abs/2605.07446v1Suwon Choi et al. — arxiv:2605.07446 — Sentiment AnalysisFri, 08 May 2026 00:00:00 GMTSentiment AnalysisConformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibrationhttp://arxiv.org/abs/2605.08077v1http://arxiv.org/abs/2605.08077v1Shuhang Lin et al. — arxiv:2605.08077 — Knowledge GraphFri, 08 May 2026 00:00:00 GMTKnowledge GraphSecuring the Dark Matter: A Semantic-Enhanced Neuro-Symbolic Framework for Supply Chain Analysis of Opaque Industrial Softwarehttp://arxiv.org/abs/2605.07737v1http://arxiv.org/abs/2605.07737v1Bowei Ning et al. — arxiv:2605.07737 — Knowledge GraphFri, 08 May 2026 00:00:00 GMTKnowledge GraphTRACE: Tourism Recommendation with Accountable Citation Evidencehttp://arxiv.org/abs/2605.07677v1http://arxiv.org/abs/2605.07677v1Zixu Zhao et al. — arxiv:2605.07677 — Knowledge GraphFri, 08 May 2026 00:00:00 GMTKnowledge GraphTacit Knowledge Extraction via Logic Augmented Generation and Active Inferencehttp://arxiv.org/abs/2605.07639v1http://arxiv.org/abs/2605.07639v1Lorenzo Lamazzi et al. — arxiv:2605.07639 — Knowledge GraphFri, 08 May 2026 00:00:00 GMTKnowledge GraphDCGL: Dual-Channel Graph Learning with Large Language Models for Knowledge-Aware Recommendationhttp://arxiv.org/abs/2605.07314v1http://arxiv.org/abs/2605.07314v1Xinchi Zou et al. — arxiv:2605.07314 — Knowledge GraphFri, 08 May 2026 00:00:00 GMTKnowledge GraphMedAction: Towards Active Multi-turn Clinical Diagnostic LLMshttp://arxiv.org/abs/2605.07305v1http://arxiv.org/abs/2605.07305v1Hsin-Ling Hsu et al. — arxiv:2605.07305 — Knowledge GraphFri, 08 May 2026 00:00:00 GMTKnowledge GraphAdaTKG: Adaptive Memory for Temporal Knowledge Graph Reasoninghttp://arxiv.org/abs/2605.07121v1http://arxiv.org/abs/2605.07121v1Seunghan Lee et al. — arxiv:2605.07121 — Knowledge GraphFri, 08 May 2026 00:00:00 GMTKnowledge GraphWho and What? Using Linguistic Features and Annotator Characteristics to Analyze Annotation Variationhttp://arxiv.org/abs/2605.06318v1http://arxiv.org/abs/2605.06318v1Maximilian Maurer et al. — arxiv:2605.06318 — NLPThu, 07 May 2026 00:00:00 GMTNLPSystematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extractionhttp://arxiv.org/abs/2605.06191v1http://arxiv.org/abs/2605.06191v1Shivali Dalmia et al. — arxiv:2605.06191 — NLPThu, 07 May 2026 00:00:00 GMTNLPVisual Fingerprints for LLM Generation Comparisonhttp://arxiv.org/abs/2605.06054v1http://arxiv.org/abs/2605.06054v1Amal Alnouri et al. — arxiv:2605.06054 — NLPThu, 07 May 2026 00:00:00 GMTNLPFastOmniTMAE: Parallel Clause Learning for Scalable and Hardware-Efficient Tsetlin Embeddingshttp://arxiv.org/abs/2605.06982v1http://arxiv.org/abs/2605.06982v1Ahmed K. Kadhim et al. — arxiv:2605.06982 — NLPThu, 07 May 2026 00:00:00 GMTNLPMultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Mediahttp://arxiv.org/abs/2605.06940v1http://arxiv.org/abs/2605.06940v1Souvik Pramanik et al. — arxiv:2605.06940 — NLPThu, 07 May 2026 00:00:00 GMTNLPReflections and New Directions for Human-Centered Large Language Modelshttp://arxiv.org/abs/2605.06901v1http://arxiv.org/abs/2605.06901v1Caleb Ziems et al. — arxiv:2605.06901 — NLPThu, 07 May 2026 00:00:00 GMTNLPTajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLPhttp://arxiv.org/abs/2605.06886v1http://arxiv.org/abs/2605.06886v1Mullosharaf K. Arabov et al. — arxiv:2605.06886 — NLPThu, 07 May 2026 00:00:00 GMTNLPEMO: Pretraining Mixture of Experts for Emergent Modularityhttp://arxiv.org/abs/2605.06663v1http://arxiv.org/abs/2605.06663v1Ryan Wang et al. — arxiv:2605.06663 — LLMThu, 07 May 2026 00:00:00 GMTLLMVerifier-Backed Hard Problem Generation for Mathematical Reasoninghttp://arxiv.org/abs/2605.06660v1http://arxiv.org/abs/2605.06660v1Yuhang Lai et al. — arxiv:2605.06660 — LLMThu, 07 May 2026 00:00:00 GMTLLMWhy Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised MLhttp://arxiv.org/abs/2605.06656v1http://arxiv.org/abs/2605.06656v1Jai Moondra et al. — arxiv:2605.06656 — LLMThu, 07 May 2026 00:00:00 GMTLLMOptimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Lesshttp://arxiv.org/abs/2605.06654v1http://arxiv.org/abs/2605.06654v1Yuxing Liu et al. — arxiv:2605.06654 — LLMThu, 07 May 2026 00:00:00 GMTLLMWhen No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labelshttp://arxiv.org/abs/2605.06652v1http://arxiv.org/abs/2605.06652v1Sushant Gautam et al. — arxiv:2605.06652 — LLMThu, 07 May 2026 00:00:00 GMTLLMBeyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradientshttp://arxiv.org/abs/2605.06650v1http://arxiv.org/abs/2605.06650v1Mingwei Xu et al. — arxiv:2605.06650 — LLMThu, 07 May 2026 00:00:00 GMTLLMSuperintelligent Retrieval Agent: The Next Frontier of Information Retrievalhttp://arxiv.org/abs/2605.06647v1http://arxiv.org/abs/2605.06647v1Zeyu Yang et al. — arxiv:2605.06647 — LLMThu, 07 May 2026 00:00:00 GMTLLMStraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstractionhttp://arxiv.org/abs/2605.06642v1http://arxiv.org/abs/2605.06642v1Xiangyuan Xue et al. — arxiv:2605.06642 — LLMThu, 07 May 2026 00:00:00 GMTLLMGlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generationhttp://arxiv.org/abs/2605.06641v1http://arxiv.org/abs/2605.06641v1Ziyu Zhai et al. — arxiv:2605.06641 — LLMThu, 07 May 2026 00:00:00 GMTLLMCan RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Keyhttp://arxiv.org/abs/2605.06638v1http://arxiv.org/abs/2605.06638v1Tianle Wang et al. — arxiv:2605.06638 — LLMThu, 07 May 2026 00:00:00 GMTLLMBAMI: Training-Free Bias Mitigation in GUI Groundinghttp://arxiv.org/abs/2605.06664v1http://arxiv.org/abs/2605.06664v1Borui Zhang et al. — arxiv:2605.06664 — LLM AgentThu, 07 May 2026 00:00:00 GMTLLM AgentAI Co-Mathematician: Accelerating Mathematicians with Agentic AIhttp://arxiv.org/abs/2605.06651v1http://arxiv.org/abs/2605.06651v1Daniel Zheng et al. — arxiv:2605.06651 — LLM AgentThu, 07 May 2026 00:00:00 GMTLLM AgentSuperintelligent Retrieval Agent: The Next Frontier of Information Retrievalhttp://arxiv.org/abs/2605.06647v1http://arxiv.org/abs/2605.06647v1Zeyu Yang et al. — arxiv:2605.06647 — LLM AgentThu, 07 May 2026 00:00:00 GMTLLM AgentStraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstractionhttp://arxiv.org/abs/2605.06642v1http://arxiv.org/abs/2605.06642v1Xiangyuan Xue et al. — arxiv:2605.06642 — LLM AgentThu, 07 May 2026 00:00:00 GMTLLM AgentRecursive Agent Optimizationhttp://arxiv.org/abs/2605.06639v1http://arxiv.org/abs/2605.06639v1Apurva Gandhi et al. — arxiv:2605.06639 — LLM AgentThu, 07 May 2026 00:00:00 GMTLLM AgentCited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agentshttp://arxiv.org/abs/2605.06635v1http://arxiv.org/abs/2605.06635v1Hailey Onweller et al. — arxiv:2605.06635 — LLM AgentThu, 07 May 2026 00:00:00 GMTLLM AgentQuantifying Trade-Offs Between Stability and Goal-Obfuscationhttp://arxiv.org/abs/2605.06630v1http://arxiv.org/abs/2605.06630v1Yixuan Wang et al. — arxiv:2605.06630 — LLM AgentThu, 07 May 2026 00:00:00 GMTLLM AgentMASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systemshttp://arxiv.org/abs/2605.06623v1http://arxiv.org/abs/2605.06623v1Zhexuan Wang et al. — arxiv:2605.06623 — LLM AgentThu, 07 May 2026 00:00:00 GMTLLM AgentSkillOS: Learning Skill Curation for Self-Evolving Agentshttp://arxiv.org/abs/2605.06614v1http://arxiv.org/abs/2605.06614v1Siru Ouyang et al. — arxiv:2605.06614 — LLM AgentThu, 07 May 2026 00:00:00 GMTLLM AgentAI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agentshttp://arxiv.org/abs/2605.06607v1http://arxiv.org/abs/2605.06607v1Nithin Somasekharan et al. — arxiv:2605.06607 — LLM AgentThu, 07 May 2026 00:00:00 GMTLLM AgentSuperintelligent Retrieval Agent: The Next Frontier of Information Retrievalhttp://arxiv.org/abs/2605.06647v1http://arxiv.org/abs/2605.06647v1Zeyu Yang et al. — arxiv:2605.06647 — Multi-AgentThu, 07 May 2026 00:00:00 GMTMulti-AgentMASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systemshttp://arxiv.org/abs/2605.06623v1http://arxiv.org/abs/2605.06623v1Zhexuan Wang et al. — arxiv:2605.06623 — Multi-AgentThu, 07 May 2026 00:00:00 GMTMulti-AgentSkillOS: Learning Skill Curation for Self-Evolving Agentshttp://arxiv.org/abs/2605.06614v1http://arxiv.org/abs/2605.06614v1Siru Ouyang et al. — arxiv:2605.06614 — Multi-AgentThu, 07 May 2026 00:00:00 GMTMulti-AgentHow Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluationhttp://arxiv.org/abs/2605.06605v1http://arxiv.org/abs/2605.06605v1Shai Feldman et al. — arxiv:2605.06605 — Multi-AgentThu, 07 May 2026 00:00:00 GMTMulti-AgentCross-Modal Navigation with Multi-Agent Reinforcement Learninghttp://arxiv.org/abs/2605.06595v1http://arxiv.org/abs/2605.06595v1Shuo Liu et al. — arxiv:2605.06595 — Multi-AgentThu, 07 May 2026 00:00:00 GMTMulti-AgentNeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Researchhttp://arxiv.org/abs/2605.06584v1http://arxiv.org/abs/2605.06584v1Lujia Zhong et al. — arxiv:2605.06584 — Multi-AgentThu, 07 May 2026 00:00:00 GMTMulti-AgentCoordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learninghttp://arxiv.org/abs/2605.06557v1http://arxiv.org/abs/2605.06557v1Maria Ana Cardei et al. — arxiv:2605.06557 — Multi-AgentThu, 07 May 2026 00:00:00 GMTMulti-AgentROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RLhttp://arxiv.org/abs/2605.06534v1http://arxiv.org/abs/2605.06534v1Wei Gao et al. — arxiv:2605.06534 — Multi-AgentThu, 07 May 2026 00:00:00 GMTMulti-AgentAgentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Modelshttp://arxiv.org/abs/2605.06522v1http://arxiv.org/abs/2605.06522v1Xin Wang et al. — arxiv:2605.06522 — Multi-AgentThu, 07 May 2026 00:00:00 GMTMulti-AgentAutonomous Adversary: Red-Teaming in the age of LLMhttp://arxiv.org/abs/2605.06486v1http://arxiv.org/abs/2605.06486v1Mohammad Mamun et al. — arxiv:2605.06486 — Multi-AgentThu, 07 May 2026 00:00:00 GMTMulti-AgentCited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agentshttp://arxiv.org/abs/2605.06635v1http://arxiv.org/abs/2605.06635v1Hailey Onweller et al. — arxiv:2605.06635 — RAGThu, 07 May 2026 00:00:00 GMTRAGHow Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluationhttp://arxiv.org/abs/2605.06605v1http://arxiv.org/abs/2605.06605v1Shai Feldman et al. — arxiv:2605.06605 — RAGThu, 07 May 2026 00:00:00 GMTRAGMiA-Signature: Approximating Global Activation for Long-Context Understandinghttp://arxiv.org/abs/2605.06416v1http://arxiv.org/abs/2605.06416v1Yuqing Li et al. — arxiv:2605.06416 — RAGThu, 07 May 2026 00:00:00 GMTRAGGATHER: Convergence-Centric Hyper-Entity Retrieval for Zero-Shot Cell-Type Annotationhttp://arxiv.org/abs/2605.06403v1http://arxiv.org/abs/2605.06403v1Zhonghui Zhang et al. — arxiv:2605.06403 — RAGThu, 07 May 2026 00:00:00 GMTRAGLatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAGhttp://arxiv.org/abs/2605.06285v1http://arxiv.org/abs/2605.06285v1Yijia Zheng et al. — arxiv:2605.06285 — RAGThu, 07 May 2026 00:00:00 GMTRAGRobotEQ: Transitioning from Passive Intelligence to Active Intelligence in Embodied AIhttp://arxiv.org/abs/2605.06234v1http://arxiv.org/abs/2605.06234v1Kuofei Fang et al. — arxiv:2605.06234 — RAGThu, 07 May 2026 00:00:00 GMTRAGEvent-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarioshttp://arxiv.org/abs/2605.06185v1http://arxiv.org/abs/2605.06185v1Peizheng Yan et al. — arxiv:2605.06185 — RAGThu, 07 May 2026 00:00:00 GMTRAGRetina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generationhttp://arxiv.org/abs/2605.06173v1http://arxiv.org/abs/2605.06173v1Abdelrahman Zaian et al. — arxiv:2605.06173 — RAGThu, 07 May 2026 00:00:00 GMTRAGIRC-Bench: Recognizing Entities from Contextual Cues in First-Person Reminiscenceshttp://arxiv.org/abs/2605.06142v1http://arxiv.org/abs/2605.06142v1Yehudit Aperstein et al. — arxiv:2605.06142 — RAGThu, 07 May 2026 00:00:00 GMTRAGTatarstan Toponyms: A Bilingual Dataset and Hybrid RAG System for Geospatial Question Answeringhttp://arxiv.org/abs/2605.05962v1http://arxiv.org/abs/2605.05962v1Mullosharaf K. Arabov et al. — arxiv:2605.05962 — RAGThu, 07 May 2026 00:00:00 GMTRAGCan RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Keyhttp://arxiv.org/abs/2605.06638v1http://arxiv.org/abs/2605.06638v1Tianle Wang et al. — arxiv:2605.06638 — ReasoningThu, 07 May 2026 00:00:00 GMTReasoningSCRuB: Social Concept Reasoning under Rubric-Based Evaluationhttp://arxiv.org/abs/2605.06444v1http://arxiv.org/abs/2605.06444v1Jamelle Watson-Daniels et al. — arxiv:2605.06444 — ReasoningThu, 07 May 2026 00:00:00 GMTReasoningGATHER: Convergence-Centric Hyper-Entity Retrieval for Zero-Shot Cell-Type Annotationhttp://arxiv.org/abs/2605.06403v1http://arxiv.org/abs/2605.06403v1Zhonghui Zhang et al. — arxiv:2605.06403 — ReasoningThu, 07 May 2026 00:00:00 GMTReasoningMeasuring Black-Box Confidence via Reasoning Trajectories: Geometry, Coverage, and Verbalizationhttp://arxiv.org/abs/2605.06308v1http://arxiv.org/abs/2605.06308v1Marc Boubnovski Martell et al. — arxiv:2605.06308 — ReasoningThu, 07 May 2026 00:00:00 GMTReasoningRethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learninghttp://arxiv.org/abs/2605.06241v1http://arxiv.org/abs/2605.06241v1Ömer Faruk Akgül et al. — arxiv:2605.06241 — ReasoningThu, 07 May 2026 00:00:00 GMTReasoningOPSD Compresses What RLVR Teaches: A Post-RL Compaction Stage for Reasoning Modelshttp://arxiv.org/abs/2605.06188v1http://arxiv.org/abs/2605.06188v1Jaehoon Kim et al. — arxiv:2605.06188 — ReasoningThu, 07 May 2026 00:00:00 GMTReasoningTeaching LLMs Program Semantics via Symbolic Execution Traceshttp://arxiv.org/abs/2605.06184v1http://arxiv.org/abs/2605.06184v1Jonas Bayer et al. — arxiv:2605.06184 — ReasoningThu, 07 May 2026 00:00:00 GMTReasoningPest-Thinker: Learning to Think and Reason like Entomologists via Reinforcement Learninghttp://arxiv.org/abs/2605.06121v1http://arxiv.org/abs/2605.06121v1Xueheng Li et al. — arxiv:2605.06121 — ReasoningThu, 07 May 2026 00:00:00 GMTReasoningPolicy-Guided Stepwise Model Routing for Cost-Effective Reasoninghttp://arxiv.org/abs/2605.06116v1http://arxiv.org/abs/2605.06116v1Wenwen Si et al. — arxiv:2605.06116 — ReasoningThu, 07 May 2026 00:00:00 GMTReasoningNovelty-based Tree-of-Thought Search for LLM Reasoning and Planninghttp://arxiv.org/abs/2605.06040v1http://arxiv.org/abs/2605.06040v1Leon Hamm et al. — arxiv:2605.06040 — ReasoningThu, 07 May 2026 00:00:00 GMTReasoningROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RLhttp://arxiv.org/abs/2605.06534v1http://arxiv.org/abs/2605.06534v1Wei Gao et al. — arxiv:2605.06534 — Tool UseThu, 07 May 2026 00:00:00 GMTTool UseReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learninghttp://arxiv.org/abs/2605.06483v1http://arxiv.org/abs/2605.06483v1Bowen Ye et al. — arxiv:2605.06483 — Tool UseThu, 07 May 2026 00:00:00 GMTTool UsePrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitorshttp://arxiv.org/abs/2605.06455v1http://arxiv.org/abs/2605.06455v1Xinmiao Huang et al. — arxiv:2605.06455 — Tool UseThu, 07 May 2026 00:00:00 GMTTool UseAsymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Levelhttp://arxiv.org/abs/2605.06387v1http://arxiv.org/abs/2605.06387v1Nan Jia et al. — arxiv:2605.06387 — Tool UseThu, 07 May 2026 00:00:00 GMTTool UseFrom Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Workhttp://arxiv.org/abs/2605.06365v1http://arxiv.org/abs/2605.06365v1Josh Rosen et al. — arxiv:2605.06365 — Tool UseThu, 07 May 2026 00:00:00 GMTTool UseMANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agentshttp://arxiv.org/abs/2605.06334v1http://arxiv.org/abs/2605.06334v1Ashwani Anand et al. — arxiv:2605.06334 — Tool UseThu, 07 May 2026 00:00:00 GMTTool UseTeaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoninghttp://arxiv.org/abs/2605.06326v1http://arxiv.org/abs/2605.06326v1Qianjia Cheng et al. — arxiv:2605.06326 — Tool UseThu, 07 May 2026 00:00:00 GMTTool UseSafactory: A Scalable Agent Factory for Trustworthy Autonomous Intelligencehttp://arxiv.org/abs/2605.06230v1http://arxiv.org/abs/2605.06230v1Xinquan Chen et al. — arxiv:2605.06230 — Tool UseThu, 07 May 2026 00:00:00 GMTTool UseDexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actionshttp://arxiv.org/abs/2605.05925v1http://arxiv.org/abs/2605.05925v1Hyesung Lee et al. — arxiv:2605.05925 — Tool UseThu, 07 May 2026 00:00:00 GMTTool UseMore Is Not Always Better: Cross-Component Interference in LLM Agent Scaffoldinghttp://arxiv.org/abs/2605.05716v1http://arxiv.org/abs/2605.05716v1Ming Liu et al. — arxiv:2605.05716 — Tool UseThu, 07 May 2026 00:00:00 GMTTool UseAccelerating the Simulation of Ordinary Differential Equations Through Physics-Preserving Neural Networkshttp://arxiv.org/abs/2605.06980v1http://arxiv.org/abs/2605.06980v1Andrew Tagg et al. — arxiv:2605.06980 — Tool UseThu, 07 May 2026 00:00:00 GMTTool UseMedHorizon: Towards Long-context Medical Video Understanding in the Wildhttp://arxiv.org/abs/2605.06537v1http://arxiv.org/abs/2605.06537v1Bodong Du et al. — arxiv:2605.06537 — Multimodal LLMThu, 07 May 2026 00:00:00 GMTMultimodal LLMGeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMshttp://arxiv.org/abs/2605.06477v1http://arxiv.org/abs/2605.06477v1Pranav Mantini et al. — arxiv:2605.06477 — Multimodal LLMThu, 07 May 2026 00:00:00 GMTMultimodal LLMA Regime Theory of Controller Class Selection for LLM Action Decisionshttp://arxiv.org/abs/2605.06339v1http://arxiv.org/abs/2605.06339v1Zhaoyang Jiang et al. — arxiv:2605.06339 — Multimodal LLMThu, 07 May 2026 00:00:00 GMTMultimodal LLMToward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulationhttp://arxiv.org/abs/2605.06311v1http://arxiv.org/abs/2605.06311v1Yixin Zhu et al. — arxiv:2605.06311 — Multimodal LLMThu, 07 May 2026 00:00:00 GMTMultimodal LLMTowards Annotation-Free Validation of MLLMs: A Vision-Language Logical Consistency Metrichttp://arxiv.org/abs/2605.06201v1http://arxiv.org/abs/2605.06201v1Ying Gu et al. — arxiv:2605.06201 — Multimodal LLMThu, 07 May 2026 00:00:00 GMTMultimodal LLMEvent-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarioshttp://arxiv.org/abs/2605.06185v1http://arxiv.org/abs/2605.06185v1Peizheng Yan et al. — arxiv:2605.06185 — Multimodal LLMThu, 07 May 2026 00:00:00 GMTMultimodal LLMVLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Expertshttp://arxiv.org/abs/2605.06175v1http://arxiv.org/abs/2605.06175v1Yuhua Jiang et al. — arxiv:2605.06175 — Multimodal LLMThu, 07 May 2026 00:00:00 GMTMultimodal LLMRetina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generationhttp://arxiv.org/abs/2605.06173v1http://arxiv.org/abs/2605.06173v1Abdelrahman Zaian et al. — arxiv:2605.06173 — Multimodal LLMThu, 07 May 2026 00:00:00 GMTMultimodal LLMPest-Thinker: Learning to Think and Reason like Entomologists via Reinforcement Learninghttp://arxiv.org/abs/2605.06121v1http://arxiv.org/abs/2605.06121v1Xueheng Li et al. — arxiv:2605.06121 — Multimodal LLMThu, 07 May 2026 00:00:00 GMTMultimodal LLMCrossCult-KIBench: A Benchmark for Cross-Cultural Knowledge Insertion in MLLMshttp://arxiv.org/abs/2605.06115v1http://arxiv.org/abs/2605.06115v1Zhen Zeng et al. — arxiv:2605.06115 — Multimodal LLMThu, 07 May 2026 00:00:00 GMTMultimodal LLMLong Context Pre-Training with Lighthouse Attentionhttp://arxiv.org/abs/2605.06554v1http://arxiv.org/abs/2605.06554v1Bowen Peng et al. — arxiv:2605.06554 — Long ContextThu, 07 May 2026 00:00:00 GMTLong ContextMedHorizon: Towards Long-context Medical Video Understanding in the Wildhttp://arxiv.org/abs/2605.06537v1http://arxiv.org/abs/2605.06537v1Bodong Du et al. — arxiv:2605.06537 — Long ContextThu, 07 May 2026 00:00:00 GMTLong ContextSTALE: Can LLM Agents Know When Their Memories Are No Longer Valid?http://arxiv.org/abs/2605.06527v1http://arxiv.org/abs/2605.06527v1Hanxiang Chao et al. — arxiv:2605.06527 — Long ContextThu, 07 May 2026 00:00:00 GMTLong ContextMiA-Signature: Approximating Global Activation for Long-Context Understandinghttp://arxiv.org/abs/2605.06416v1http://arxiv.org/abs/2605.06416v1Yuqing Li et al. — arxiv:2605.06416 — Long ContextThu, 07 May 2026 00:00:00 GMTLong ContextDon't Lose Focus: Activation Steering via Key-Orthogonal Projectionshttp://arxiv.org/abs/2605.06342v1http://arxiv.org/abs/2605.06342v1Haoyan Luo et al. — arxiv:2605.06342 — Long ContextThu, 07 May 2026 00:00:00 GMTLong ContextCKT-WAM: Parameter-Efficient Context Knowledge Transfer Between World Action Modelshttp://arxiv.org/abs/2605.06247v1http://arxiv.org/abs/2605.06247v1Yuhua Jiang et al. — arxiv:2605.06247 — Long ContextThu, 07 May 2026 00:00:00 GMTLong ContextUniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsificationhttp://arxiv.org/abs/2605.06221v1http://arxiv.org/abs/2605.06221v1Qihang Fan et al. — arxiv:2605.06221 — Long ContextThu, 07 May 2026 00:00:00 GMTLong ContextOPSD Compresses What RLVR Teaches: A Post-RL Compaction Stage for Reasoning Modelshttp://arxiv.org/abs/2605.06188v1http://arxiv.org/abs/2605.06188v1Jaehoon Kim et al. — arxiv:2605.06188 — Long ContextThu, 07 May 2026 00:00:00 GMTLong ContextEvent-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarioshttp://arxiv.org/abs/2605.06185v1http://arxiv.org/abs/2605.06185v1Peizheng Yan et al. — arxiv:2605.06185 — Long ContextThu, 07 May 2026 00:00:00 GMTLong ContextMemReranker: Reasoning-Aware Reranking for Agent Memory Retrievalhttp://arxiv.org/abs/2605.06132v1http://arxiv.org/abs/2605.06132v1Chunyu Li et al. — arxiv:2605.06132 — Long ContextThu, 07 May 2026 00:00:00 GMTLong ContextOptimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Lesshttp://arxiv.org/abs/2605.06654v1http://arxiv.org/abs/2605.06654v1Yuxing Liu et al. — arxiv:2605.06654 — LLM EfficiencyThu, 07 May 2026 00:00:00 GMTLLM EfficiencyLiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operationhttp://arxiv.org/abs/2605.06628v1http://arxiv.org/abs/2605.06628v1Dan Jacobellis et al. — arxiv:2605.06628 — LLM EfficiencyThu, 07 May 2026 00:00:00 GMTLLM EfficiencyPairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenizationhttp://arxiv.org/abs/2605.06582v1http://arxiv.org/abs/2605.06582v1Adhiraj Banerjee et al. — arxiv:2605.06582 — LLM EfficiencyThu, 07 May 2026 00:00:00 GMTLLM EfficiencyPACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantizationhttp://arxiv.org/abs/2605.06505v1http://arxiv.org/abs/2605.06505v1Murat Bilgehan Ertan et al. — arxiv:2605.06505 — LLM EfficiencyThu, 07 May 2026 00:00:00 GMTLLM EfficiencyAgenticPrecoding: LLM-Empowered Multi-Agent System for Precoding Optimizationhttp://arxiv.org/abs/2605.06443v1http://arxiv.org/abs/2605.06443v1Zijiu Yang et al. — arxiv:2605.06443 — LLM EfficiencyThu, 07 May 2026 00:00:00 GMTLLM EfficiencyLayer Collapse in Diffusion Language Modelshttp://arxiv.org/abs/2605.06366v1http://arxiv.org/abs/2605.06366v1Alexander Conzelmann et al. — arxiv:2605.06366 — LLM EfficiencyThu, 07 May 2026 00:00:00 GMTLLM EfficiencyFine-Tuning Small Language Models for Solution-Oriented Windows Event Log Analysishttp://arxiv.org/abs/2605.06330v1http://arxiv.org/abs/2605.06330v1Siraaj Akhtar et al. — arxiv:2605.06330 — LLM EfficiencyThu, 07 May 2026 00:00:00 GMTLLM EfficiencyTaming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generationhttp://arxiv.org/abs/2605.06207v1http://arxiv.org/abs/2605.06207v1Bowen Zheng et al. — arxiv:2605.06207 — LLM EfficiencyThu, 07 May 2026 00:00:00 GMTLLM EfficiencyRethinking Adapter Placement: A Dominant Adaptation Module Perspectivehttp://arxiv.org/abs/2605.06183v1http://arxiv.org/abs/2605.06183v1Suoxin Zhang et al. — arxiv:2605.06183 — LLM EfficiencyThu, 07 May 2026 00:00:00 GMTLLM EfficiencyVLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Expertshttp://arxiv.org/abs/2605.06175v1http://arxiv.org/abs/2605.06175v1Yuhua Jiang et al. — arxiv:2605.06175 — LLM EfficiencyThu, 07 May 2026 00:00:00 GMTLLM EfficiencyOn the non-radial oscillations of realistic anisotropic neutron stars: Axial modeshttp://arxiv.org/abs/2605.06418v1http://arxiv.org/abs/2605.06418v1Jose F. Rodriguez-Ruiz et al. — arxiv:2605.06418 — LLM EfficiencyThu, 07 May 2026 00:00:00 GMTLLM EfficiencyA Unified Pair-GRPO Family: From Implicit to Explicit Preference Constraints for Stable and General RL Alignmenthttp://arxiv.org/abs/2605.06375v1http://arxiv.org/abs/2605.06375v1Hao Yu et al. — arxiv:2605.06375 — AlignmentThu, 07 May 2026 00:00:00 GMTAlignmentArena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Modelshttp://arxiv.org/abs/2605.06070v1http://arxiv.org/abs/2605.06070v1Zhikai Li et al. — arxiv:2605.06070 — AlignmentThu, 07 May 2026 00:00:00 GMTAlignmentFusion in Your Way: Aligning Image Fusion with Heterogeneous Demands via Direct Preference Optimizationhttp://arxiv.org/abs/2605.06049v1http://arxiv.org/abs/2605.06049v1Weijian Su et al. — arxiv:2605.06049 — AlignmentThu, 07 May 2026 00:00:00 GMTAlignmentOptimal Transport for LLM Reward Modeling from Noisy Preferencehttp://arxiv.org/abs/2605.06036v1http://arxiv.org/abs/2605.06036v1Licheng Pan et al. — arxiv:2605.06036 — AlignmentThu, 07 May 2026 00:00:00 GMTAlignmentPREFER: Personalized Review Summarization with Online Preference Learninghttp://arxiv.org/abs/2605.05911v1http://arxiv.org/abs/2605.05911v1Millend Roy et al. — arxiv:2605.05911 — AlignmentThu, 07 May 2026 00:00:00 GMTAlignmentRVPO: Risk-Sensitive Alignment via Variance Regularizationhttp://arxiv.org/abs/2605.05750v1http://arxiv.org/abs/2605.05750v1Ivan Montero et al. — arxiv:2605.05750 — AlignmentThu, 07 May 2026 00:00:00 GMTAlignmentDual-Agent Co-Training for Health Coaching via Implicit Adversarial Preference Optimizationhttp://arxiv.org/abs/2605.07011v1http://arxiv.org/abs/2605.07011v1Da Long et al. — arxiv:2605.07011 — AlignmentThu, 07 May 2026 00:00:00 GMTAlignment$f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyseshttp://arxiv.org/abs/2605.06977v1http://arxiv.org/abs/2605.06977v1Di Wu et al. — arxiv:2605.06977 — AlignmentThu, 07 May 2026 00:00:00 GMTAlignmentMulti-Objective Constraint Inference using Inverse reinforcement learninghttp://arxiv.org/abs/2605.06951v1http://arxiv.org/abs/2605.06951v1Syed Ihtesham Hussain Shah et al. — arxiv:2605.06951 — AlignmentThu, 07 May 2026 00:00:00 GMTAlignmentMitigating Cognitive Bias in RLHF by Altering Rationalityhttp://arxiv.org/abs/2605.06895v1http://arxiv.org/abs/2605.06895v1Tiffany Horter et al. — arxiv:2605.06895 — AlignmentThu, 07 May 2026 00:00:00 GMTAlignmentHow to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignmenthttp://arxiv.org/abs/2605.06850v1http://arxiv.org/abs/2605.06850v1Rui Zhu et al. — arxiv:2605.06850 — AlignmentThu, 07 May 2026 00:00:00 GMTAlignmentCited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agentshttp://arxiv.org/abs/2605.06635v1http://arxiv.org/abs/2605.06635v1Hailey Onweller et al. — arxiv:2605.06635 — HallucinationThu, 07 May 2026 00:00:00 GMTHallucinationHow Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluationhttp://arxiv.org/abs/2605.06605v1http://arxiv.org/abs/2605.06605v1Shai Feldman et al. — arxiv:2605.06605 — HallucinationThu, 07 May 2026 00:00:00 GMTHallucinationAutomated Clinical Report Generation for Remote Cognitive Remediation: Comparing Knowledge-Engineered Templates and LLMs in Low-Resource Settingshttp://arxiv.org/abs/2605.06594v1http://arxiv.org/abs/2605.06594v1Yongxin Zhou et al. — arxiv:2605.06594 — HallucinationThu, 07 May 2026 00:00:00 GMTHallucinationTowards Metric-Faithful Neural Graph Matchinghttp://arxiv.org/abs/2605.06588v1http://arxiv.org/abs/2605.06588v1Jyotirmaya Shivottam et al. — arxiv:2605.06588 — HallucinationThu, 07 May 2026 00:00:00 GMTHallucinationAn algebraic model for rational ultracommutative ringshttp://arxiv.org/abs/2605.06515v1http://arxiv.org/abs/2605.06515v1William Balderrama et al. — arxiv:2605.06515 — HallucinationThu, 07 May 2026 00:00:00 GMTHallucinationHyperbolic Concept Bottleneck Modelshttp://arxiv.org/abs/2605.06440v1http://arxiv.org/abs/2605.06440v1Daniel Uyterlinde et al. — arxiv:2605.06440 — HallucinationThu, 07 May 2026 00:00:00 GMTHallucinationFRInGe: Distribution-Space Integrated Gradients with Fisher--Rao Geometryhttp://arxiv.org/abs/2605.06404v1http://arxiv.org/abs/2605.06404v1Gabriele Martino et al. — arxiv:2605.06404 — HallucinationThu, 07 May 2026 00:00:00 GMTHallucinationSwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generationhttp://arxiv.org/abs/2605.06356v1http://arxiv.org/abs/2605.06356v1YaoYang Liu et al. — arxiv:2605.06356 — HallucinationThu, 07 May 2026 00:00:00 GMTHallucinationPACE: Prune-And-Compress Ensemble Modelshttp://arxiv.org/abs/2605.06278v1http://arxiv.org/abs/2605.06278v1Fabian Akkerman et al. — arxiv:2605.06278 — HallucinationThu, 07 May 2026 00:00:00 GMTHallucinationA Comparative Study of Mass Extraction Schemes and $π^\pm-ρ^\pm$ Mixinghttp://arxiv.org/abs/2605.06271v1http://arxiv.org/abs/2605.06271v1Ziyue Wang et al. — arxiv:2605.06271 — HallucinationThu, 07 May 2026 00:00:00 GMTHallucinationWhen No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labelshttp://arxiv.org/abs/2605.06652v1http://arxiv.org/abs/2605.06652v1Sushant Gautam et al. — arxiv:2605.06652 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyHow Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluationhttp://arxiv.org/abs/2605.06605v1http://arxiv.org/abs/2605.06605v1Shai Feldman et al. — arxiv:2605.06605 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyAutonomous Adversary: Red-Teaming in the age of LLMhttp://arxiv.org/abs/2605.06486v1http://arxiv.org/abs/2605.06486v1Mohammad Mamun et al. — arxiv:2605.06486 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyMemory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluationshttp://arxiv.org/abs/2605.06357v1http://arxiv.org/abs/2605.06357v1Yuan Du et al. — arxiv:2605.06357 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyBeyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judgeshttp://arxiv.org/abs/2605.06161v1http://arxiv.org/abs/2605.06161v1Shihao Weng et al. — arxiv:2605.06161 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyLightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderationhttp://arxiv.org/abs/2605.05950v1http://arxiv.org/abs/2605.05950v1Siyuan Li et al. — arxiv:2605.05950 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyLoopTrap: Termination Poisoning Attacks on LLM Agentshttp://arxiv.org/abs/2605.05846v1http://arxiv.org/abs/2605.05846v1Huiyu Xu et al. — arxiv:2605.05846 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyConceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMshttp://arxiv.org/abs/2605.05709v1http://arxiv.org/abs/2605.05709v1Md Farhamdur Reza et al. — arxiv:2605.05709 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyDataDignity: Training Data Attribution for Large Language Modelshttp://arxiv.org/abs/2605.05687v1http://arxiv.org/abs/2605.05687v1Xiaomin Li et al. — arxiv:2605.05687 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyPersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AIhttp://arxiv.org/abs/2605.05682v1http://arxiv.org/abs/2605.05682v1Wesley Hanwen Deng et al. — arxiv:2605.05682 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyA Systematic Investigation of The RL-Jailbreaker in LLMshttp://arxiv.org/abs/2605.07032v1http://arxiv.org/abs/2605.07032v1Montaser Mohammedalamen et al. — arxiv:2605.07032 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyMIND: Monge Inception Distance for Generative Models Evaluationhttp://arxiv.org/abs/2605.06797v1http://arxiv.org/abs/2605.06797v1Quentin Berthet et al. — arxiv:2605.06797 — LLM SafetyThu, 07 May 2026 00:00:00 GMTLLM SafetyCited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agentshttp://arxiv.org/abs/2605.06635v1http://arxiv.org/abs/2605.06635v1Hailey Onweller et al. — arxiv:2605.06635 — LLM EvaluationThu, 07 May 2026 00:00:00 GMTLLM EvaluationHow Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluationhttp://arxiv.org/abs/2605.06605v1http://arxiv.org/abs/2605.06605v1Shai Feldman et al. — arxiv:2605.06605 — LLM EvaluationThu, 07 May 2026 00:00:00 GMTLLM EvaluationAutonomous Adversary: Red-Teaming in the age of LLMhttp://arxiv.org/abs/2605.06486v1http://arxiv.org/abs/2605.06486v1Mohammad Mamun et al. — arxiv:2605.06486 — LLM EvaluationThu, 07 May 2026 00:00:00 GMTLLM EvaluationPrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitorshttp://arxiv.org/abs/2605.06455v1http://arxiv.org/abs/2605.06455v1Xinmiao Huang et al. — arxiv:2605.06455 — LLM EvaluationThu, 07 May 2026 00:00:00 GMTLLM EvaluationSCRuB: Social Concept Reasoning under Rubric-Based Evaluationhttp://arxiv.org/abs/2605.06444v1http://arxiv.org/abs/2605.06444v1Jamelle Watson-Daniels et al. — arxiv:2605.06444 — LLM EvaluationThu, 07 May 2026 00:00:00 GMTLLM EvaluationMANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agentshttp://arxiv.org/abs/2605.06334v1http://arxiv.org/abs/2605.06334v1Ashwani Anand et al. — arxiv:2605.06334 — LLM EvaluationThu, 07 May 2026 00:00:00 GMTLLM EvaluationMeasuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneityhttp://arxiv.org/abs/2605.06327v1http://arxiv.org/abs/2605.06327v1Florian A. D. Burnat et al. — arxiv:2605.06327 — LLM EvaluationThu, 07 May 2026 00:00:00 GMTLLM EvaluationQuantifying the Statistical Effect of Rubric Modifications on Human-Autorater Agreementhttp://arxiv.org/abs/2605.06283v1http://arxiv.org/abs/2605.06283v1Jessica Huynh et al. — arxiv:2605.06283 — LLM EvaluationThu, 07 May 2026 00:00:00 GMTLLM EvaluationJoint Consistency: A Unified Test-Time Aggregation Framework via Energy Minimizationhttp://arxiv.org/abs/2605.06219v1http://arxiv.org/abs/2605.06219v1Yunzhen Yao et al. — arxiv:2605.06219 — LLM EvaluationThu, 07 May 2026 00:00:00 GMTLLM EvaluationBeyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judgeshttp://arxiv.org/abs/2605.06161v1http://arxiv.org/abs/2605.06161v1Shihao Weng et al. — arxiv:2605.06161 — LLM EvaluationThu, 07 May 2026 00:00:00 GMTLLM EvaluationTo What Extent Does Agent-generated Code Require Maintenance? An Empirical Studyhttp://arxiv.org/abs/2605.06464v1http://arxiv.org/abs/2605.06464v1Shota Sawada et al. — arxiv:2605.06464 — Code LLMThu, 07 May 2026 00:00:00 GMTCode LLMConstraint Decay: The Fragility of LLM Agents in Backend Code Generationhttp://arxiv.org/abs/2605.06445v1http://arxiv.org/abs/2605.06445v1Francesco Dente et al. — arxiv:2605.06445 — Code LLMThu, 07 May 2026 00:00:00 GMTCode LLMAgenticPrecoding: LLM-Empowered Multi-Agent System for Precoding Optimizationhttp://arxiv.org/abs/2605.06443v1http://arxiv.org/abs/2605.06443v1Zijiu Yang et al. — arxiv:2605.06443 — Code LLMThu, 07 May 2026 00:00:00 GMTCode LLMRethinking Adapter Placement: A Dominant Adaptation Module Perspectivehttp://arxiv.org/abs/2605.06183v1http://arxiv.org/abs/2605.06183v1Suoxin Zhang et al. — arxiv:2605.06183 — Code LLMThu, 07 May 2026 00:00:00 GMTCode LLMSchedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMshttp://arxiv.org/abs/2605.06111v1http://arxiv.org/abs/2605.06111v1Yujia Chen et al. — arxiv:2605.06111 — Code LLMThu, 07 May 2026 00:00:00 GMTCode LLMFalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplicationhttp://arxiv.org/abs/2605.06057v1http://arxiv.org/abs/2605.06057v1Honglin Zhu et al. — arxiv:2605.06057 — Code LLMThu, 07 May 2026 00:00:00 GMTCode LLMEvaluating Non-English Developer Support in Machine Learning for Software Engineeringhttp://arxiv.org/abs/2605.05902v1http://arxiv.org/abs/2605.05902v1Jonathan Katzy et al. — arxiv:2605.05902 — Code LLMThu, 07 May 2026 00:00:00 GMTCode LLMOn Fixing Insecure AI-Generated Code through Model Fine-Tuning and Prompting Strategieshttp://arxiv.org/abs/2605.05867v1http://arxiv.org/abs/2605.05867v1Ali Soltanian Fard Jahromi et al. — arxiv:2605.05867 — Code LLMThu, 07 May 2026 00:00:00 GMTCode LLMCircuitFormer: A Circuit Language Model for Analog Topology Design from Natural Language Prompthttp://arxiv.org/abs/2605.05773v1http://arxiv.org/abs/2605.05773v1Md Touhidul Islam et al. — arxiv:2605.05773 — Code LLMThu, 07 May 2026 00:00:00 GMTCode LLMRetrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generationhttp://arxiv.org/abs/2605.05657v1http://arxiv.org/abs/2605.05657v1Abhijit Talluri et al. — arxiv:2605.05657 — Code LLMThu, 07 May 2026 00:00:00 GMTCode LLMDelulu: A Verified Multi-Lingual Benchmark for Code Hallucination Detection in Fill-in-the-Middle Taskshttp://arxiv.org/abs/2605.07024v1http://arxiv.org/abs/2605.07024v1Mahdi Erfanian et al. — arxiv:2605.07024 — Code LLMThu, 07 May 2026 00:00:00 GMTCode LLMMinimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLMhttp://arxiv.org/abs/2605.05927v2http://arxiv.org/abs/2605.05927v2Wenqian Cui et al. — arxiv:2605.05927 — Speech LLMThu, 07 May 2026 00:00:00 GMTSpeech LLMVITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singinghttp://arxiv.org/abs/2605.06765v1http://arxiv.org/abs/2605.06765v1Jiacheng Xu et al. — arxiv:2605.06765 — Speech LLMThu, 07 May 2026 00:00:00 GMTSpeech LLMSystematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extractionhttp://arxiv.org/abs/2605.06191v1http://arxiv.org/abs/2605.06191v1Shivali Dalmia et al. — arxiv:2605.06191 — Medical NLPThu, 07 May 2026 00:00:00 GMTMedical NLPDecodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimeshttp://arxiv.org/abs/2605.05715v1http://arxiv.org/abs/2605.05715v1Ming Liu et al. — arxiv:2605.05715 — Medical NLPThu, 07 May 2026 00:00:00 GMTMedical NLPIn Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphshttp://arxiv.org/abs/2605.05931v1http://arxiv.org/abs/2605.05931v1Ndeye-Emilie Mbengue et al. — arxiv:2605.05931 — Multilingual NLPThu, 07 May 2026 00:00:00 GMTMultilingual NLPWhich Are the Low-Resource Languages of the Semantic Web?http://arxiv.org/abs/2605.05929v1http://arxiv.org/abs/2605.05929v1Ndeye-Emilie Mbengue et al. — arxiv:2605.05929 — Multilingual NLPThu, 07 May 2026 00:00:00 GMTMultilingual NLPUnderstanding Cross-Language Transfer Improvements in Low-Resource HTR: The Role of Sequence Modelinghttp://arxiv.org/abs/2605.05900v1http://arxiv.org/abs/2605.05900v1Sana Al-azzawi et al. — arxiv:2605.05900 — Multilingual NLPThu, 07 May 2026 00:00:00 GMTMultilingual NLPX-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloninghttp://arxiv.org/abs/2605.05611v1http://arxiv.org/abs/2605.05611v1Rixi Xu et al. — arxiv:2605.05611 — Multilingual NLPThu, 07 May 2026 00:00:00 GMTMultilingual NLPDelulu: A Verified Multi-Lingual Benchmark for Code Hallucination Detection in Fill-in-the-Middle Taskshttp://arxiv.org/abs/2605.07024v1http://arxiv.org/abs/2605.07024v1Mahdi Erfanian et al. — arxiv:2605.07024 — Multilingual NLPThu, 07 May 2026 00:00:00 GMTMultilingual NLPMultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Mediahttp://arxiv.org/abs/2605.06940v1http://arxiv.org/abs/2605.06940v1Souvik Pramanik et al. — arxiv:2605.06940 — Multilingual NLPThu, 07 May 2026 00:00:00 GMTMultilingual NLPIRC-Bench: Recognizing Entities from Contextual Cues in First-Person Reminiscenceshttp://arxiv.org/abs/2605.06142v1http://arxiv.org/abs/2605.06142v1Yehudit Aperstein et al. — arxiv:2605.06142 — Named Entity RecognitionThu, 07 May 2026 00:00:00 GMTNamed Entity RecognitionInductive Power Grid Cascading Failure Analysis with GRU-Gated Graph Attentionhttp://arxiv.org/abs/2605.07010v1http://arxiv.org/abs/2605.07010v1Tianxin Zhou et al. — arxiv:2605.07010 — Information ExtractionThu, 07 May 2026 00:00:00 GMTInformation ExtractionSuperintelligent Retrieval Agent: The Next Frontier of Information Retrievalhttp://arxiv.org/abs/2605.06647v1http://arxiv.org/abs/2605.06647v1Zeyu Yang et al. — arxiv:2605.06647 — Question AnsweringThu, 07 May 2026 00:00:00 GMTQuestion AnsweringTask-Aware Answer Preservation under Audio Compression for Large Audio Language Modelshttp://arxiv.org/abs/2605.06631v1http://arxiv.org/abs/2605.06631v1Amir Ivry et al. — arxiv:2605.06631 — Question AnsweringThu, 07 May 2026 00:00:00 GMTQuestion AnsweringRethinking Vacuity for OOD Detection in Evidential Deep Learninghttp://arxiv.org/abs/2605.06382v1http://arxiv.org/abs/2605.06382v1Claire McNamara et al. — arxiv:2605.06382 — Question AnsweringThu, 07 May 2026 00:00:00 GMTQuestion AnsweringLatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAGhttp://arxiv.org/abs/2605.06285v1http://arxiv.org/abs/2605.06285v1Yijia Zheng et al. — arxiv:2605.06285 — Question AnsweringThu, 07 May 2026 00:00:00 GMTQuestion AnsweringTowards Self-Explainable Document Visual Question Answering with Chain-of-Explanation Predictionshttp://arxiv.org/abs/2605.06058v1http://arxiv.org/abs/2605.06058v1Kjetil Indrehus et al. — arxiv:2605.06058 — Question AnsweringThu, 07 May 2026 00:00:00 GMTQuestion AnsweringMulti-agent decision making: A Blackwell's informativeness approachhttp://arxiv.org/abs/2605.06028v1http://arxiv.org/abs/2605.06028v1Zheng Zhang et al. — arxiv:2605.06028 — Question AnsweringThu, 07 May 2026 00:00:00 GMTQuestion AnsweringTraining Transformers for KV Cache Compressibilityhttp://arxiv.org/abs/2605.05971v1http://arxiv.org/abs/2605.05971v1Yoav Gelberg et al. — arxiv:2605.05971 — Question AnsweringThu, 07 May 2026 00:00:00 GMTQuestion AnsweringTatarstan Toponyms: A Bilingual Dataset and Hybrid RAG System for Geospatial Question Answeringhttp://arxiv.org/abs/2605.05962v1http://arxiv.org/abs/2605.05962v1Mullosharaf K. Arabov et al. — arxiv:2605.05962 — Question AnsweringThu, 07 May 2026 00:00:00 GMTQuestion AnsweringHallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuitshttp://arxiv.org/abs/2605.05953v1http://arxiv.org/abs/2605.05953v1Erik Nielsen et al. — arxiv:2605.05953 — Question AnsweringThu, 07 May 2026 00:00:00 GMTQuestion AnsweringICU-Bench:Benchmarking Continual Unlearning in Multimodal Large Language Modelshttp://arxiv.org/abs/2605.05938v1http://arxiv.org/abs/2605.05938v1Yuhang Wang et al. — arxiv:2605.05938 — Question AnsweringThu, 07 May 2026 00:00:00 GMTQuestion AnsweringAre We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Studyhttp://arxiv.org/abs/2605.06643v1http://arxiv.org/abs/2605.06643v1Hao Dong et al. — arxiv:2605.06643 — Sentiment AnalysisThu, 07 May 2026 00:00:00 GMTSentiment AnalysisAffectGPT-RL: Revealing Roles of Reinforcement Learning in Open-Vocabulary Emotion Recognitionhttp://arxiv.org/abs/2605.06126v1http://arxiv.org/abs/2605.06126v1Zheng Lian et al. — arxiv:2605.06126 — Sentiment AnalysisThu, 07 May 2026 00:00:00 GMTSentiment AnalysisKnowledge Graphs, the Missing Link in Agentic AI-based Formal Verificationhttp://arxiv.org/abs/2605.06434v1http://arxiv.org/abs/2605.06434v1Vaisakh Naduvodi Viswambharan et al. — arxiv:2605.06434 — Knowledge GraphThu, 07 May 2026 00:00:00 GMTKnowledge GraphGATHER: Convergence-Centric Hyper-Entity Retrieval for Zero-Shot Cell-Type Annotationhttp://arxiv.org/abs/2605.06403v1http://arxiv.org/abs/2605.06403v1Zhonghui Zhang et al. — arxiv:2605.06403 — Knowledge GraphThu, 07 May 2026 00:00:00 GMTKnowledge GraphEvent-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarioshttp://arxiv.org/abs/2605.06185v1http://arxiv.org/abs/2605.06185v1Peizheng Yan et al. — arxiv:2605.06185 — Knowledge GraphThu, 07 May 2026 00:00:00 GMTKnowledge GraphGraphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Modelshttp://arxiv.org/abs/2605.06154v1http://arxiv.org/abs/2605.06154v1Kossi Amouzouvi et al. — arxiv:2605.06154 — Knowledge GraphThu, 07 May 2026 00:00:00 GMTKnowledge GraphIn Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphshttp://arxiv.org/abs/2605.05931v1http://arxiv.org/abs/2605.05931v1Ndeye-Emilie Mbengue et al. — arxiv:2605.05931 — Knowledge GraphThu, 07 May 2026 00:00:00 GMTKnowledge GraphWhich Are the Low-Resource Languages of the Semantic Web?http://arxiv.org/abs/2605.05929v1http://arxiv.org/abs/2605.05929v1Ndeye-Emilie Mbengue et al. — arxiv:2605.05929 — Knowledge GraphThu, 07 May 2026 00:00:00 GMTKnowledge GraphKnowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agentshttp://arxiv.org/abs/2605.05702v1http://arxiv.org/abs/2605.05702v1Huyu Wu et al. — arxiv:2605.05702 — Knowledge GraphThu, 07 May 2026 00:00:00 GMTKnowledge GraphSPARK: Self-Play with Asymmetric Reward from Knowledge Graphshttp://arxiv.org/abs/2605.05546v1http://arxiv.org/abs/2605.05546v1Hyobin Park et al. — arxiv:2605.05546 — Knowledge GraphThu, 07 May 2026 00:00:00 GMTKnowledge GraphSuperposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecastinghttp://arxiv.org/abs/2605.05151v1http://arxiv.org/abs/2605.05151v1Alper Yıldırım et al. — arxiv:2605.05151 — NLPWed, 06 May 2026 00:00:00 GMTNLPTabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understandinghttp://arxiv.org/abs/2605.04962v1http://arxiv.org/abs/2605.04962v1Minjie Qiang et al. — arxiv:2605.04962 — NLPWed, 06 May 2026 00:00:00 GMTNLPMeasuring Psychological States Through Semantic Projection: A Theory-Driven Approach to Language-Based Assessmenthttp://arxiv.org/abs/2605.04873v1http://arxiv.org/abs/2605.04873v1Maria Luongo et al. — arxiv:2605.04873 — NLPWed, 06 May 2026 00:00:00 GMTNLPDistributed Energy System Design including Unbalanced AC Power Flow for Large LV Networks with ADMMhttp://arxiv.org/abs/2605.04746v1http://arxiv.org/abs/2605.04746v1Robert Steven et al. — arxiv:2605.04746 — NLPWed, 06 May 2026 00:00:00 GMTNLPTajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script)http://arxiv.org/abs/2605.04583v1http://arxiv.org/abs/2605.04583v1Mullosharaf K. Arabov et al. — arxiv:2605.04583 — NLPWed, 06 May 2026 00:00:00 GMTNLPA Hybrid Method for Low-Resource Named Entity Recognitionhttp://arxiv.org/abs/2605.04489v1http://arxiv.org/abs/2605.04489v1Do Minh Duc et al. — arxiv:2605.04489 — NLPWed, 06 May 2026 00:00:00 GMTNLPRobustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphshttp://arxiv.org/abs/2605.05463v1http://arxiv.org/abs/2605.05463v1Othmane Kabal et al. — arxiv:2605.05463 — NLPWed, 06 May 2026 00:00:00 GMTNLPD-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Modelshttp://arxiv.org/abs/2605.05204v1http://arxiv.org/abs/2605.05204v1Dengyang Jiang et al. — arxiv:2605.05204 — LLMWed, 06 May 2026 00:00:00 GMTLLMAlmost-Orthogonality in Lp Spaces: A Case Study with Grokhttp://arxiv.org/abs/2605.05192v1http://arxiv.org/abs/2605.05192v1Ziang Chen et al. — arxiv:2605.05192 — LLMWed, 06 May 2026 00:00:00 GMTLLMMRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledgehttp://arxiv.org/abs/2605.05175v1http://arxiv.org/abs/2605.05175v1Perry E. Radau et al. — arxiv:2605.05175 — LLMWed, 06 May 2026 00:00:00 GMTLLMDesign Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hourshttp://arxiv.org/abs/2605.05170v1http://arxiv.org/abs/2605.05170v1The Verkor Team et al. — arxiv:2605.05170 — LLMWed, 06 May 2026 00:00:00 GMTLLMPSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentationhttp://arxiv.org/abs/2605.05159v1http://arxiv.org/abs/2605.05159v1Srikar Kashyap Pulipaka et al. — arxiv:2605.05159 — LLMWed, 06 May 2026 00:00:00 GMTLLMLow-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Predictionhttp://arxiv.org/abs/2605.05134v1http://arxiv.org/abs/2605.05134v1Dan Wilson et al. — arxiv:2605.05134 — LLMWed, 06 May 2026 00:00:00 GMTLLMJoint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputationhttp://arxiv.org/abs/2605.05125v1http://arxiv.org/abs/2605.05125v1Olivia Jullian Parra et al. — arxiv:2605.05125 — LLMWed, 06 May 2026 00:00:00 GMTLLMBeyond Semantics: An Evidential Reasoning-Aware Multi-View Learning Framework for Trustworthy Mental Health Predictionhttp://arxiv.org/abs/2605.05121v1http://arxiv.org/abs/2605.05121v1Yucheng Ruan et al. — arxiv:2605.05121 — LLMWed, 06 May 2026 00:00:00 GMTLLMOn the Hardness of Junking LLMshttp://arxiv.org/abs/2605.05116v1http://arxiv.org/abs/2605.05116v1Marco Rando et al. — arxiv:2605.05116 — LLMWed, 06 May 2026 00:00:00 GMTLLMText Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurementhttp://arxiv.org/abs/2605.05103v1http://arxiv.org/abs/2605.05103v1Nicholas S. Kersting et al. — arxiv:2605.05103 — LLMWed, 06 May 2026 00:00:00 GMTLLMLongSeeker: Elastic Context Orchestration for Long-Horizon Search Agentshttp://arxiv.org/abs/2605.05191v1http://arxiv.org/abs/2605.05191v1Yijun Lu et al. — arxiv:2605.05191 — LLM AgentWed, 06 May 2026 00:00:00 GMTLLM AgentOpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agentshttp://arxiv.org/abs/2605.05185v1http://arxiv.org/abs/2605.05185v1Shuang Chen et al. — arxiv:2605.05185 — LLM AgentWed, 06 May 2026 00:00:00 GMTLLM AgentDesign Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hourshttp://arxiv.org/abs/2605.05170v1http://arxiv.org/abs/2605.05170v1The Verkor Team et al. — arxiv:2605.05170 — LLM AgentWed, 06 May 2026 00:00:00 GMTLLM AgentPhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual Worldhttp://arxiv.org/abs/2605.05163v1http://arxiv.org/abs/2605.05163v1Yunhan Yang et al. — arxiv:2605.05163 — LLM AgentWed, 06 May 2026 00:00:00 GMTLLM AgentLocal and global optimization in Parallel Minority Gameshttp://arxiv.org/abs/2605.05141v1http://arxiv.org/abs/2605.05141v1Soumyajyoti Biswas et al. — arxiv:2605.05141 — LLM AgentWed, 06 May 2026 00:00:00 GMTLLM AgentExecutable World Models for ARC-AGI-3 in the Era of Coding Agentshttp://arxiv.org/abs/2605.05138v1http://arxiv.org/abs/2605.05138v1Sergey Rodionov et al. — arxiv:2605.05138 — LLM AgentWed, 06 May 2026 00:00:00 GMTLLM AgentThe Demand Externality of Automationhttp://arxiv.org/abs/2605.05127v1http://arxiv.org/abs/2605.05127v1Erhan Bayraktar et al. — arxiv:2605.05127 — LLM AgentWed, 06 May 2026 00:00:00 GMTLLM AgentRollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regimehttp://arxiv.org/abs/2605.05112v1http://arxiv.org/abs/2605.05112v1Tianshu Zhu et al. — arxiv:2605.05112 — LLM AgentWed, 06 May 2026 00:00:00 GMTLLM AgentGraph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learninghttp://arxiv.org/abs/2605.05020v1http://arxiv.org/abs/2605.05020v1Shawn Ray et al. — arxiv:2605.05020 — LLM AgentWed, 06 May 2026 00:00:00 GMTLLM AgentUno-Orchestra: Parsimonious Agent Routing via Selective Delegationhttp://arxiv.org/abs/2605.05007v1http://arxiv.org/abs/2605.05007v1Zhiqing Cui et al. — arxiv:2605.05007 — LLM AgentWed, 06 May 2026 00:00:00 GMTLLM AgentOpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agentshttp://arxiv.org/abs/2605.05185v1http://arxiv.org/abs/2605.05185v1Shuang Chen et al. — arxiv:2605.05185 — Multi-AgentWed, 06 May 2026 00:00:00 GMTMulti-AgentDesign Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hourshttp://arxiv.org/abs/2605.05170v1http://arxiv.org/abs/2605.05170v1The Verkor Team et al. — arxiv:2605.05170 — Multi-AgentWed, 06 May 2026 00:00:00 GMTMulti-AgentGraph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learninghttp://arxiv.org/abs/2605.05020v1http://arxiv.org/abs/2605.05020v1Shawn Ray et al. — arxiv:2605.05020 — Multi-AgentWed, 06 May 2026 00:00:00 GMTMulti-AgentUno-Orchestra: Parsimonious Agent Routing via Selective Delegationhttp://arxiv.org/abs/2605.05007v1http://arxiv.org/abs/2605.05007v1Zhiqing Cui et al. — arxiv:2605.05007 — Multi-AgentWed, 06 May 2026 00:00:00 GMTMulti-AgentModular Reinforcement Learning For Cooperative Swarmshttp://arxiv.org/abs/2605.04939v1http://arxiv.org/abs/2605.04939v1Erel Shtossel et al. — arxiv:2605.04939 — Multi-AgentWed, 06 May 2026 00:00:00 GMTMulti-AgentEvolving Idea Graphs with Learnable Edits-and-Commits for Multi-Agent Scientific Ideationhttp://arxiv.org/abs/2605.04922v1http://arxiv.org/abs/2605.04922v1Jiangwen Dong et al. — arxiv:2605.04922 — Multi-AgentWed, 06 May 2026 00:00:00 GMTMulti-AgentStrat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Gameshttp://arxiv.org/abs/2605.04906v1http://arxiv.org/abs/2605.04906v1Yidong He et al. — arxiv:2605.04906 — Multi-AgentWed, 06 May 2026 00:00:00 GMTMulti-AgentStorage Is Not Memory: A Retrieval-Centered Architecture for Agent Recallhttp://arxiv.org/abs/2605.04897v1http://arxiv.org/abs/2605.04897v1Joshua Adler et al. — arxiv:2605.04897 — Multi-AgentWed, 06 May 2026 00:00:00 GMTMulti-AgentAgentic Repository Mining: A Multi-Task Evaluationhttp://arxiv.org/abs/2605.04845v1http://arxiv.org/abs/2605.04845v1Johannes Härtel et al. — arxiv:2605.04845 — Multi-AgentWed, 06 May 2026 00:00:00 GMTMulti-AgentTree-based Credit Assignment for Multi-Agent Memory Systemhttp://arxiv.org/abs/2605.04811v1http://arxiv.org/abs/2605.04811v1Marina Mao et al. — arxiv:2605.04811 — Multi-AgentWed, 06 May 2026 00:00:00 GMTMulti-AgentHow Does Chunking Affect Retrieval-Augmented Code Completion? A Controlled Empirical Studyhttp://arxiv.org/abs/2605.04763v1http://arxiv.org/abs/2605.04763v1Xinjian Wu et al. — arxiv:2605.04763 — RAGWed, 06 May 2026 00:00:00 GMTRAGGraph-Augmented LLMs for Swiss MP Ideology Predictionhttp://arxiv.org/abs/2605.04643v1http://arxiv.org/abs/2605.04643v1Yifei Yuan et al. — arxiv:2605.04643 — RAGWed, 06 May 2026 00:00:00 GMTRAGCAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.04495v1http://arxiv.org/abs/2605.04495v1Zhipeng Song et al. — arxiv:2605.04495 — RAGWed, 06 May 2026 00:00:00 GMTRAGDoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluationhttp://arxiv.org/abs/2605.04458v1http://arxiv.org/abs/2605.04458v1Bryan Li et al. — arxiv:2605.04458 — RAGWed, 06 May 2026 00:00:00 GMTRAGEP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidancehttp://arxiv.org/abs/2605.04960v1http://arxiv.org/abs/2605.04960v1Song Yu et al. — arxiv:2605.04960 — ReasoningWed, 06 May 2026 00:00:00 GMTReasoningStrat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Gameshttp://arxiv.org/abs/2605.04906v1http://arxiv.org/abs/2605.04906v1Yidong He et al. — arxiv:2605.04906 — ReasoningWed, 06 May 2026 00:00:00 GMTReasoningVocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Modelshttp://arxiv.org/abs/2605.04613v1http://arxiv.org/abs/2605.04613v1Yukun Chen et al. — arxiv:2605.04613 — ReasoningWed, 06 May 2026 00:00:00 GMTReasoningPen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysishttp://arxiv.org/abs/2605.04499v1http://arxiv.org/abs/2605.04499v1Yasod Ginige et al. — arxiv:2605.04499 — ReasoningWed, 06 May 2026 00:00:00 GMTReasoningLongSeeker: Elastic Context Orchestration for Long-Horizon Search Agentshttp://arxiv.org/abs/2605.05191v1http://arxiv.org/abs/2605.05191v1Yijun Lu et al. — arxiv:2605.05191 — Tool UseWed, 06 May 2026 00:00:00 GMTTool UsePreference-Based Self-Distillation: Beyond KL Matching via Reward Regularizationhttp://arxiv.org/abs/2605.05040v1http://arxiv.org/abs/2605.05040v1Xin Yu et al. — arxiv:2605.05040 — Tool UseWed, 06 May 2026 00:00:00 GMTTool UseUno-Orchestra: Parsimonious Agent Routing via Selective Delegationhttp://arxiv.org/abs/2605.05007v1http://arxiv.org/abs/2605.05007v1Zhiqing Cui et al. — arxiv:2605.05007 — Tool UseWed, 06 May 2026 00:00:00 GMTTool UseAgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Usehttp://arxiv.org/abs/2605.04785v1http://arxiv.org/abs/2605.04785v1Chenglin Yang et al. — arxiv:2605.04785 — Tool UseWed, 06 May 2026 00:00:00 GMTTool UseD-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Modelshttp://arxiv.org/abs/2605.05204v1http://arxiv.org/abs/2605.05204v1Dengyang Jiang et al. — arxiv:2605.05204 — Multimodal LLMWed, 06 May 2026 00:00:00 GMTMultimodal LLMPhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual Worldhttp://arxiv.org/abs/2605.05163v1http://arxiv.org/abs/2605.05163v1Yunhan Yang et al. — arxiv:2605.05163 — Multimodal LLMWed, 06 May 2026 00:00:00 GMTMultimodal LLMWasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaginghttp://arxiv.org/abs/2605.05161v1http://arxiv.org/abs/2605.05161v1Bernhard Kainz et al. — arxiv:2605.05161 — Multimodal LLMWed, 06 May 2026 00:00:00 GMTMultimodal LLMDirect Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptationhttp://arxiv.org/abs/2605.05054v1http://arxiv.org/abs/2605.05054v1Hongxu Chen et al. — arxiv:2605.05054 — Multimodal LLMWed, 06 May 2026 00:00:00 GMTMultimodal LLMWhen Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noisehttp://arxiv.org/abs/2605.05045v1http://arxiv.org/abs/2605.05045v1Philip Wootaek Shin et al. — arxiv:2605.05045 — Multimodal LLMWed, 06 May 2026 00:00:00 GMTMultimodal LLMPrompt-Anchored Vision-Text Distillation for Lifelong Person Re-identificationhttp://arxiv.org/abs/2605.05027v1http://arxiv.org/abs/2605.05027v1Wen Wen et al. — arxiv:2605.05027 — Multimodal LLMWed, 06 May 2026 00:00:00 GMTMultimodal LLMFairEnc: A Fair Vision-Language Model with Fair Vision and Text Encoders for Glaucoma Detectionhttp://arxiv.org/abs/2605.04882v1http://arxiv.org/abs/2605.04882v1Mohamed Elhabebe et al. — arxiv:2605.04882 — Multimodal LLMWed, 06 May 2026 00:00:00 GMTMultimodal LLMUncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Modelshttp://arxiv.org/abs/2605.04874v1http://arxiv.org/abs/2605.04874v1Huatian Zhang et al. — arxiv:2605.04874 — Multimodal LLMWed, 06 May 2026 00:00:00 GMTMultimodal LLMReward-Decomposed Reinforcement Learning for Immersive Video Role-Playinghttp://arxiv.org/abs/2605.04733v1http://arxiv.org/abs/2605.04733v1Miao Wang et al. — arxiv:2605.04733 — Multimodal LLMWed, 06 May 2026 00:00:00 GMTMultimodal LLMAnny-Fit: All-Age Human Mesh Recoveryhttp://arxiv.org/abs/2605.04728v1http://arxiv.org/abs/2605.04728v1Laura Bravo-Sánchez et al. — arxiv:2605.04728 — Multimodal LLMWed, 06 May 2026 00:00:00 GMTMultimodal LLMStable homotopy theory of higher categorieshttp://arxiv.org/abs/2605.05195v1http://arxiv.org/abs/2605.05195v1Hadrian Heine et al. — arxiv:2605.05195 — Long ContextWed, 06 May 2026 00:00:00 GMTLong ContextLongSeeker: Elastic Context Orchestration for Long-Horizon Search Agentshttp://arxiv.org/abs/2605.05191v1http://arxiv.org/abs/2605.05191v1Yijun Lu et al. — arxiv:2605.05191 — Long ContextWed, 06 May 2026 00:00:00 GMTLong ContextDriver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollouthttp://arxiv.org/abs/2605.05092v1http://arxiv.org/abs/2605.05092v1Haozhuang Chi et al. — arxiv:2605.05092 — Long ContextWed, 06 May 2026 00:00:00 GMTLong ContextThe Impossibility Triangle of Long-Context Modelinghttp://arxiv.org/abs/2605.05066v1http://arxiv.org/abs/2605.05066v1Yan Zhou et al. — arxiv:2605.05066 — Long ContextWed, 06 May 2026 00:00:00 GMTLong ContextUno-Orchestra: Parsimonious Agent Routing via Selective Delegationhttp://arxiv.org/abs/2605.05007v1http://arxiv.org/abs/2605.05007v1Zhiqing Cui et al. — arxiv:2605.05007 — Long ContextWed, 06 May 2026 00:00:00 GMTLong ContextA meta-analysis of the effect of generative AI on productivity and learning in programminghttp://arxiv.org/abs/2605.04779v1http://arxiv.org/abs/2605.04779v1Sebastian Maier et al. — arxiv:2605.04779 — Long ContextWed, 06 May 2026 00:00:00 GMTLong ContextSpectro-Polarimetric Observations of TeV Sources (SPOTS): First resultshttp://arxiv.org/abs/2605.04619v1http://arxiv.org/abs/2605.04619v1J. Barnard et al. — arxiv:2605.04619 — Long ContextWed, 06 May 2026 00:00:00 GMTLong ContextSCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic Stateshttp://arxiv.org/abs/2605.04496v1http://arxiv.org/abs/2605.04496v1Zhenliang Zhang et al. — arxiv:2605.04496 — Long ContextWed, 06 May 2026 00:00:00 GMTLong ContextStream-T1: Test-Time Scaling for Streaming Video Generationhttp://arxiv.org/abs/2605.04461v1http://arxiv.org/abs/2605.04461v1Yijing Tu et al. — arxiv:2605.04461 — Long ContextWed, 06 May 2026 00:00:00 GMTLong ContextHow Do Ice Shelves Calve? Peridynamic Modeling of Ice Shelf Fracture Driven by Wave Erosion, Basal Melting, and Buoyancy Flexurehttp://arxiv.org/abs/2605.04365v1http://arxiv.org/abs/2605.04365v1Ying Song et al. — arxiv:2605.04365 — Long ContextWed, 06 May 2026 00:00:00 GMTLong ContextDynamical correlations in a dissipative XXZ spin chainhttp://arxiv.org/abs/2605.05162v1http://arxiv.org/abs/2605.05162v1Cătălin Paşcu Moca et al. — arxiv:2605.05162 — LLM EfficiencyWed, 06 May 2026 00:00:00 GMTLLM EfficiencyPSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentationhttp://arxiv.org/abs/2605.05159v1http://arxiv.org/abs/2605.05159v1Srikar Kashyap Pulipaka et al. — arxiv:2605.05159 — LLM EfficiencyWed, 06 May 2026 00:00:00 GMTLLM EfficiencyQuantum Entanglement in the Dirac Field Quantization around Charged Black Holeshttp://arxiv.org/abs/2605.05143v1http://arxiv.org/abs/2605.05143v1Abdessamie Chhieb et al. — arxiv:2605.05143 — LLM EfficiencyWed, 06 May 2026 00:00:00 GMTLLM EfficiencyCapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendationhttp://arxiv.org/abs/2605.05096v1http://arxiv.org/abs/2605.05096v1Wenzhuo Cheng et al. — arxiv:2605.05096 — LLM EfficiencyWed, 06 May 2026 00:00:00 GMTLLM EfficiencyQuantized Probabilistic AI for Gear Fault Diagnosis in Motor Driveshttp://arxiv.org/abs/2605.05032v1http://arxiv.org/abs/2605.05032v1Subham Sahoo et al. — arxiv:2605.05032 — LLM EfficiencyWed, 06 May 2026 00:00:00 GMTLLM EfficiencyYou Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translationhttp://arxiv.org/abs/2605.04992v1http://arxiv.org/abs/2605.04992v1Marco Arazzi et al. — arxiv:2605.04992 — LLM EfficiencyWed, 06 May 2026 00:00:00 GMTLLM EfficiencyLow-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Datahttp://arxiv.org/abs/2605.04989v1http://arxiv.org/abs/2605.04989v1Ali Shibli et al. — arxiv:2605.04989 — LLM EfficiencyWed, 06 May 2026 00:00:00 GMTLLM EfficiencyKernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernelshttp://arxiv.org/abs/2605.04956v1http://arxiv.org/abs/2605.04956v1Han Wang et al. — arxiv:2605.04956 — LLM EfficiencyWed, 06 May 2026 00:00:00 GMTLLM EfficiencyAdaptive Inverted-Index Routing for Granular Mixtures-of-Expertshttp://arxiv.org/abs/2605.04952v1http://arxiv.org/abs/2605.04952v1Klaus-Rudolf Kladny et al. — arxiv:2605.04952 — LLM EfficiencyWed, 06 May 2026 00:00:00 GMTLLM EfficiencyAdapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkirhttp://arxiv.org/abs/2605.04948v1http://arxiv.org/abs/2605.04948v1Mullosharaf K. Arabov et al. — arxiv:2605.04948 — LLM EfficiencyWed, 06 May 2026 00:00:00 GMTLLM EfficiencyPreference-Based Self-Distillation: Beyond KL Matching via Reward Regularizationhttp://arxiv.org/abs/2605.05040v1http://arxiv.org/abs/2605.05040v1Xin Yu et al. — arxiv:2605.05040 — AlignmentWed, 06 May 2026 00:00:00 GMTAlignmentUncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Modelshttp://arxiv.org/abs/2605.04874v1http://arxiv.org/abs/2605.04874v1Huatian Zhang et al. — arxiv:2605.04874 — AlignmentWed, 06 May 2026 00:00:00 GMTAlignmentRLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimizationhttp://arxiv.org/abs/2605.04539v1http://arxiv.org/abs/2605.04539v1Qiming Bao et al. — arxiv:2605.04539 — AlignmentWed, 06 May 2026 00:00:00 GMTAlignmentTowards General Preference Alignment: Diffusion Models at Nash Equilibriumhttp://arxiv.org/abs/2605.04494v1http://arxiv.org/abs/2605.04494v1Jiaming Hu et al. — arxiv:2605.04494 — AlignmentWed, 06 May 2026 00:00:00 GMTAlignmentData-dependent Exploration for Online Reinforcement Learning from Human Feedbackhttp://arxiv.org/abs/2605.04477v1http://arxiv.org/abs/2605.04477v1Zhen-Yu Zhang et al. — arxiv:2605.04477 — AlignmentWed, 06 May 2026 00:00:00 GMTAlignmentLongSeeker: Elastic Context Orchestration for Long-Horizon Search Agentshttp://arxiv.org/abs/2605.05191v1http://arxiv.org/abs/2605.05191v1Yijun Lu et al. — arxiv:2605.05191 — HallucinationWed, 06 May 2026 00:00:00 GMTHallucinationThe First Token Knows: Single-Decode Confidence for Hallucination Detectionhttp://arxiv.org/abs/2605.05166v1http://arxiv.org/abs/2605.05166v1Mina Gabriel et al. — arxiv:2605.05166 — HallucinationWed, 06 May 2026 00:00:00 GMTHallucinationLow-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Predictionhttp://arxiv.org/abs/2605.05134v1http://arxiv.org/abs/2605.05134v1Dan Wilson et al. — arxiv:2605.05134 — HallucinationWed, 06 May 2026 00:00:00 GMTHallucinationText Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurementhttp://arxiv.org/abs/2605.05103v1http://arxiv.org/abs/2605.05103v1Nicholas S. Kersting et al. — arxiv:2605.05103 — HallucinationWed, 06 May 2026 00:00:00 GMTHallucinationAutomatically Finding and Validating Unexpected Side-Effects of Interventions on Language Modelshttp://arxiv.org/abs/2605.05090v1http://arxiv.org/abs/2605.05090v1Quintin Pope et al. — arxiv:2605.05090 — HallucinationWed, 06 May 2026 00:00:00 GMTHallucinationWhen Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noisehttp://arxiv.org/abs/2605.05045v1http://arxiv.org/abs/2605.05045v1Philip Wootaek Shin et al. — arxiv:2605.05045 — HallucinationWed, 06 May 2026 00:00:00 GMTHallucinationLocal Intrinsic Dimension Unveils Hallucinations in Diffusion Modelshttp://arxiv.org/abs/2605.05026v1http://arxiv.org/abs/2605.05026v1Bartlomiej Sobieski et al. — arxiv:2605.05026 — HallucinationWed, 06 May 2026 00:00:00 GMTHallucinationDetecting Hallucinations in Large Language Models via Internal Attention Divergence Signalshttp://arxiv.org/abs/2605.05025v1http://arxiv.org/abs/2605.05025v1Gijs van Dijk et al. — arxiv:2605.05025 — HallucinationWed, 06 May 2026 00:00:00 GMTHallucinationMisaligned by Reward: Socially Undesirable Preferences in LLMshttp://arxiv.org/abs/2605.05003v1http://arxiv.org/abs/2605.05003v1Gayane Ghazaryan et al. — arxiv:2605.05003 — HallucinationWed, 06 May 2026 00:00:00 GMTHallucinationSelf-Attention as Transport: Limits of Symmetric Spectral Diagnosticshttp://arxiv.org/abs/2605.04893v1http://arxiv.org/abs/2605.04893v1Dominik Dahlem et al. — arxiv:2605.04893 — HallucinationWed, 06 May 2026 00:00:00 GMTHallucinationOn the Hardness of Junking LLMshttp://arxiv.org/abs/2605.05116v1http://arxiv.org/abs/2605.05116v1Marco Rando et al. — arxiv:2605.05116 — LLM SafetyWed, 06 May 2026 00:00:00 GMTLLM SafetySoK: Robustness in Large Language Models against Jailbreak Attackshttp://arxiv.org/abs/2605.05058v1http://arxiv.org/abs/2605.05058v1Feiyue Xu et al. — arxiv:2605.05058 — LLM SafetyWed, 06 May 2026 00:00:00 GMTLLM SafetyDecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agentshttp://arxiv.org/abs/2605.04808v1http://arxiv.org/abs/2605.04808v1Zhaorun Chen et al. — arxiv:2605.04808 — LLM SafetyWed, 06 May 2026 00:00:00 GMTLLM SafetySparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimizationhttp://arxiv.org/abs/2605.04700v1http://arxiv.org/abs/2605.04700v1Zheng Fang et al. — arxiv:2605.04700 — LLM SafetyWed, 06 May 2026 00:00:00 GMTLLM SafetyPhysical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Patternhttp://arxiv.org/abs/2605.04675v1http://arxiv.org/abs/2605.04675v1Xiaopei Zhu et al. — arxiv:2605.04675 — LLM SafetyWed, 06 May 2026 00:00:00 GMTLLM SafetyDissociating spatial frequency reliance from adversarial robustness advantages in neurally guided deep convolutional neural networkshttp://arxiv.org/abs/2605.04443v1http://arxiv.org/abs/2605.04443v1Zhenan Shao et al. — arxiv:2605.04443 — LLM SafetyWed, 06 May 2026 00:00:00 GMTLLM SafetyMRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledgehttp://arxiv.org/abs/2605.05175v1http://arxiv.org/abs/2605.05175v1Perry E. Radau et al. — arxiv:2605.05175 — LLM EvaluationWed, 06 May 2026 00:00:00 GMTLLM EvaluationText Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurementhttp://arxiv.org/abs/2605.05103v1http://arxiv.org/abs/2605.05103v1Nicholas S. Kersting et al. — arxiv:2605.05103 — LLM EvaluationWed, 06 May 2026 00:00:00 GMTLLM EvaluationSoK: Robustness in Large Language Models against Jailbreak Attackshttp://arxiv.org/abs/2605.05058v1http://arxiv.org/abs/2605.05058v1Feiyue Xu et al. — arxiv:2605.05058 — LLM EvaluationWed, 06 May 2026 00:00:00 GMTLLM EvaluationBenCSSmark: Making the Social Sciences Count in LLM Researchhttp://arxiv.org/abs/2605.04886v1http://arxiv.org/abs/2605.04886v1Arnault Chatelain et al. — arxiv:2605.04886 — LLM EvaluationWed, 06 May 2026 00:00:00 GMTLLM EvaluationAgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Usehttp://arxiv.org/abs/2605.04785v1http://arxiv.org/abs/2605.04785v1Chenglin Yang et al. — arxiv:2605.04785 — LLM EvaluationWed, 06 May 2026 00:00:00 GMTLLM EvaluationRLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimizationhttp://arxiv.org/abs/2605.04539v1http://arxiv.org/abs/2605.04539v1Qiming Bao et al. — arxiv:2605.04539 — LLM EvaluationWed, 06 May 2026 00:00:00 GMTLLM EvaluationRaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generationhttp://arxiv.org/abs/2605.04523v1http://arxiv.org/abs/2605.04523v1Ivan Bondarenko et al. — arxiv:2605.04523 — LLM EvaluationWed, 06 May 2026 00:00:00 GMTLLM EvaluationDiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioninghttp://arxiv.org/abs/2605.04503v1http://arxiv.org/abs/2605.04503v1Yuancheng Wei et al. — arxiv:2605.04503 — LLM EvaluationWed, 06 May 2026 00:00:00 GMTLLM EvaluationArchitectural Constraints Alignment in AI-assisted, Platform-based Service Developmenthttp://arxiv.org/abs/2605.04973v1http://arxiv.org/abs/2605.04973v1Julius Irion et al. — arxiv:2605.04973 — Code LLMWed, 06 May 2026 00:00:00 GMTCode LLMDelta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffshttp://arxiv.org/abs/2605.04903v1http://arxiv.org/abs/2605.04903v1Santosh Premi Adhikari et al. — arxiv:2605.04903 — Code LLMWed, 06 May 2026 00:00:00 GMTCode LLMImplicit Representations of Grammaticality in Language Modelshttp://arxiv.org/abs/2605.05197v1http://arxiv.org/abs/2605.05197v1Yingshan Susan Wang et al. — arxiv:2605.05197 — Multilingual NLPWed, 06 May 2026 00:00:00 GMTMultilingual NLPHarnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varietieshttp://arxiv.org/abs/2605.04500v1http://arxiv.org/abs/2605.04500v1Jinju Kim et al. — arxiv:2605.04500 — Multilingual NLPWed, 06 May 2026 00:00:00 GMTMultilingual NLPA Hybrid Method for Low-Resource Named Entity Recognitionhttp://arxiv.org/abs/2605.04489v1http://arxiv.org/abs/2605.04489v1Do Minh Duc et al. — arxiv:2605.04489 — Multilingual NLPWed, 06 May 2026 00:00:00 GMTMultilingual NLPDoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluationhttp://arxiv.org/abs/2605.04458v1http://arxiv.org/abs/2605.04458v1Bryan Li et al. — arxiv:2605.04458 — Multilingual NLPWed, 06 May 2026 00:00:00 GMTMultilingual NLPA Hybrid Method for Low-Resource Named Entity Recognitionhttp://arxiv.org/abs/2605.04489v1http://arxiv.org/abs/2605.04489v1Do Minh Duc et al. — arxiv:2605.04489 — Named Entity RecognitionWed, 06 May 2026 00:00:00 GMTNamed Entity RecognitionA Hybrid Method for Low-Resource Named Entity Recognitionhttp://arxiv.org/abs/2605.04489v1http://arxiv.org/abs/2605.04489v1Do Minh Duc et al. — arxiv:2605.04489 — Information ExtractionWed, 06 May 2026 00:00:00 GMTInformation ExtractionThe First Token Knows: Single-Decode Confidence for Hallucination Detectionhttp://arxiv.org/abs/2605.05166v1http://arxiv.org/abs/2605.05166v1Mina Gabriel et al. — arxiv:2605.05166 — Text ClassificationWed, 06 May 2026 00:00:00 GMTText ClassificationThe First Token Knows: Single-Decode Confidence for Hallucination Detectionhttp://arxiv.org/abs/2605.05166v1http://arxiv.org/abs/2605.05166v1Mina Gabriel et al. — arxiv:2605.05166 — Question AnsweringWed, 06 May 2026 00:00:00 GMTQuestion AnsweringVTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQAhttp://arxiv.org/abs/2605.04870v1http://arxiv.org/abs/2605.04870v1Haibin He et al. — arxiv:2605.04870 — Question AnsweringWed, 06 May 2026 00:00:00 GMTQuestion AnsweringTree-based Credit Assignment for Multi-Agent Memory Systemhttp://arxiv.org/abs/2605.04811v1http://arxiv.org/abs/2605.04811v1Marina Mao et al. — arxiv:2605.04811 — Question AnsweringWed, 06 May 2026 00:00:00 GMTQuestion AnsweringInformation Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understandinghttp://arxiv.org/abs/2605.04475v1http://arxiv.org/abs/2605.04475v1Shuo Liu et al. — arxiv:2605.04475 — Question AnsweringWed, 06 May 2026 00:00:00 GMTQuestion AnsweringKEET: Explaining Performance of GPU Kernels Using LLM Agentshttp://arxiv.org/abs/2605.04467v1http://arxiv.org/abs/2605.04467v1Joshua H. Davis et al. — arxiv:2605.04467 — Question AnsweringWed, 06 May 2026 00:00:00 GMTQuestion AnsweringDoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluationhttp://arxiv.org/abs/2605.04458v1http://arxiv.org/abs/2605.04458v1Bryan Li et al. — arxiv:2605.04458 — Question AnsweringWed, 06 May 2026 00:00:00 GMTQuestion AnsweringMisrouter: Exploiting Routing Mechanisms for Input-Only Attacks on Mixture-of-Experts LLMshttp://arxiv.org/abs/2605.04446v1http://arxiv.org/abs/2605.04446v1Zekun Fei et al. — arxiv:2605.04446 — Question AnsweringWed, 06 May 2026 00:00:00 GMTQuestion AnsweringTelegraph English: Semantic Prompt Compression via Structured Symbolic Rewritinghttp://arxiv.org/abs/2605.04426v1http://arxiv.org/abs/2605.04426v1Mikhail L. Arbuzov et al. — arxiv:2605.04426 — Question AnsweringWed, 06 May 2026 00:00:00 GMTQuestion AnsweringSentiment Analysis and Customer Satisfaction Prediction on E-Commerce Platforms Based on YouTube Comments Using the XGBoost Algorithmhttp://arxiv.org/abs/2605.04887v1http://arxiv.org/abs/2605.04887v1Ridho Benedictus Togi Manik et al. — arxiv:2605.04887 — Sentiment AnalysisWed, 06 May 2026 00:00:00 GMTSentiment AnalysisMeasuring Psychological States Through Semantic Projection: A Theory-Driven Approach to Language-Based Assessmenthttp://arxiv.org/abs/2605.04873v1http://arxiv.org/abs/2605.04873v1Maria Luongo et al. — arxiv:2605.04873 — Sentiment AnalysisWed, 06 May 2026 00:00:00 GMTSentiment AnalysisCHE-TKG: Collaborative Historical Evidence and Evolutionary Dynamics Learning for Temporal Knowledge Graph Reasoninghttp://arxiv.org/abs/2605.04652v1http://arxiv.org/abs/2605.04652v1Shuai-long Lei et al. — arxiv:2605.04652 — Knowledge GraphWed, 06 May 2026 00:00:00 GMTKnowledge GraphGraph-Augmented LLMs for Swiss MP Ideology Predictionhttp://arxiv.org/abs/2605.04643v1http://arxiv.org/abs/2605.04643v1Yifei Yuan et al. — arxiv:2605.04643 — Knowledge GraphWed, 06 May 2026 00:00:00 GMTKnowledge GraphA Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networkshttp://arxiv.org/abs/2605.05476v1http://arxiv.org/abs/2605.05476v1Othmane Kabal et al. — arxiv:2605.05476 — Knowledge GraphWed, 06 May 2026 00:00:00 GMTKnowledge GraphRobustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphshttp://arxiv.org/abs/2605.05463v1http://arxiv.org/abs/2605.05463v1Othmane Kabal et al. — arxiv:2605.05463 — Knowledge GraphWed, 06 May 2026 00:00:00 GMTKnowledge GraphNatural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHFhttp://arxiv.org/abs/2605.03799v1http://arxiv.org/abs/2605.03799v1Mullosharaf K. Arabov et al. — arxiv:2605.03799 — NLPTue, 05 May 2026 00:00:00 GMTNLPSERE: Structural Example Retrieval for Enhancing LLMs in Event Causality Identificationhttp://arxiv.org/abs/2605.03701v1http://arxiv.org/abs/2605.03701v1Zhifeng Hao et al. — arxiv:2605.03701 — NLPTue, 05 May 2026 00:00:00 GMTNLPAnnotation Quality in Aspect-Based Sentiment Analysis: A Case Study Comparing Experts, Students, Crowdworkers, and Large Language Modelhttp://arxiv.org/abs/2605.03624v1http://arxiv.org/abs/2605.03624v1Niklas Donhauser et al. — arxiv:2605.03624 — NLPTue, 05 May 2026 00:00:00 GMTNLPRetrieving Floods without Floodlights: Topic Models as Binary Classifiers for Extreme Climate Events in German Newshttp://arxiv.org/abs/2605.03450v1http://arxiv.org/abs/2605.03450v1Brielen Madureira et al. — arxiv:2605.03450 — NLPTue, 05 May 2026 00:00:00 GMTNLPTowards Self-Referential Analytic Assessment: A Profile-Based Approach to L2 Writing Evaluation with LLMshttp://arxiv.org/abs/2605.04298v1http://arxiv.org/abs/2605.04298v1Stefano Bannò et al. — arxiv:2605.04298 — NLPTue, 05 May 2026 00:00:00 GMTNLPNsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languageshttp://arxiv.org/abs/2605.04208v1http://arxiv.org/abs/2605.04208v1Stephen E. Moore et al. — arxiv:2605.04208 — NLPTue, 05 May 2026 00:00:00 GMTNLPLarge Language Models are Universal Reasoners for Visual Generationhttp://arxiv.org/abs/2605.04040v1http://arxiv.org/abs/2605.04040v1Sucheng Ren et al. — arxiv:2605.04040 — LLMTue, 05 May 2026 00:00:00 GMTLLMSafety and accuracy follow different scaling laws in clinical large language modelshttp://arxiv.org/abs/2605.04039v1http://arxiv.org/abs/2605.04039v1Sebastian Wind et al. — arxiv:2605.04039 — LLMTue, 05 May 2026 00:00:00 GMTLLMOpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectorieshttp://arxiv.org/abs/2605.04036v1http://arxiv.org/abs/2605.04036v1Yuwen Du et al. — arxiv:2605.04036 — LLMTue, 05 May 2026 00:00:00 GMTLLMStayin' Aligned Over Time: Towards Longitudinal Human-LLM Alignment via Contextual Reflection and Privacy-Preserving Behavioral Datahttp://arxiv.org/abs/2605.04029v1http://arxiv.org/abs/2605.04029v1Simret Araya Gebreegziabher et al. — arxiv:2605.04029 — LLMTue, 05 May 2026 00:00:00 GMTLLMSymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessmenthttp://arxiv.org/abs/2605.04012v1http://arxiv.org/abs/2605.04012v1Joseph Breda et al. — arxiv:2605.04012 — LLMTue, 05 May 2026 00:00:00 GMTLLMPhysics-Grounded Multi-Agent Architecture for Traceable, Risk-Aware Human-AI Decision Support in Manufacturinghttp://arxiv.org/abs/2605.04003v1http://arxiv.org/abs/2605.04003v1Danny Hoang et al. — arxiv:2605.04003 — LLMTue, 05 May 2026 00:00:00 GMTLLMMitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learninghttp://arxiv.org/abs/2605.04000v1http://arxiv.org/abs/2605.04000v1P Akilesh et al. — arxiv:2605.04000 — LLMTue, 05 May 2026 00:00:00 GMTLLMEQUITRIAGE: A Fairness Audit of Gender Bias in LLM-Based Emergency Department Triagehttp://arxiv.org/abs/2605.03998v1http://arxiv.org/abs/2605.03998v1Richard J. Young et al. — arxiv:2605.03998 — LLMTue, 05 May 2026 00:00:00 GMTLLMFrom Intent to Execution: Composing Agentic Workflows with Agent Recommendationhttp://arxiv.org/abs/2605.03986v1http://arxiv.org/abs/2605.03986v1Kishan Athrey et al. — arxiv:2605.03986 — LLMTue, 05 May 2026 00:00:00 GMTLLMLogical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgmentshttp://arxiv.org/abs/2605.03971v1http://arxiv.org/abs/2605.03971v1Hao Mi et al. — arxiv:2605.03971 — LLMTue, 05 May 2026 00:00:00 GMTLLMAudio-Visual Intelligence in Large Foundation Modelshttp://arxiv.org/abs/2605.04045v1http://arxiv.org/abs/2605.04045v1You Qin et al. — arxiv:2605.04045 — LLM AgentTue, 05 May 2026 00:00:00 GMTLLM AgentSafety and accuracy follow different scaling laws in clinical large language modelshttp://arxiv.org/abs/2605.04039v1http://arxiv.org/abs/2605.04039v1Sebastian Wind et al. — arxiv:2605.04039 — LLM AgentTue, 05 May 2026 00:00:00 GMTLLM AgentModel order reduction for parametrized variational inequalities: application to crowd motionhttp://arxiv.org/abs/2605.04037v1http://arxiv.org/abs/2605.04037v1Giulia Sambataro et al. — arxiv:2605.04037 — LLM AgentTue, 05 May 2026 00:00:00 GMTLLM AgentOpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectorieshttp://arxiv.org/abs/2605.04036v1http://arxiv.org/abs/2605.04036v1Yuwen Du et al. — arxiv:2605.04036 — LLM AgentTue, 05 May 2026 00:00:00 GMTLLM AgentRedefining AI Red Teaming in the Agentic Era: From Weeks to Hourshttp://arxiv.org/abs/2605.04019v1http://arxiv.org/abs/2605.04019v1Raja Sekhar Rao Dheekonda et al. — arxiv:2605.04019 — LLM AgentTue, 05 May 2026 00:00:00 GMTLLM AgentRethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systemshttp://arxiv.org/abs/2605.04018v1http://arxiv.org/abs/2605.04018v1Yilun Zhao et al. — arxiv:2605.04018 — LLM AgentTue, 05 May 2026 00:00:00 GMTLLM AgentSymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessmenthttp://arxiv.org/abs/2605.04012v1http://arxiv.org/abs/2605.04012v1Joseph Breda et al. — arxiv:2605.04012 — LLM AgentTue, 05 May 2026 00:00:00 GMTLLM AgentPhysics-Grounded Multi-Agent Architecture for Traceable, Risk-Aware Human-AI Decision Support in Manufacturinghttp://arxiv.org/abs/2605.04003v1http://arxiv.org/abs/2605.04003v1Danny Hoang et al. — arxiv:2605.04003 — LLM AgentTue, 05 May 2026 00:00:00 GMTLLM AgentMitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learninghttp://arxiv.org/abs/2605.04000v1http://arxiv.org/abs/2605.04000v1P Akilesh et al. — arxiv:2605.04000 — LLM AgentTue, 05 May 2026 00:00:00 GMTLLM AgentAn Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestrationhttp://arxiv.org/abs/2605.03989v1http://arxiv.org/abs/2605.03989v1Dutao Zhang et al. — arxiv:2605.03989 — LLM AgentTue, 05 May 2026 00:00:00 GMTLLM AgentRedefining AI Red Teaming in the Agentic Era: From Weeks to Hourshttp://arxiv.org/abs/2605.04019v1http://arxiv.org/abs/2605.04019v1Raja Sekhar Rao Dheekonda et al. — arxiv:2605.04019 — Multi-AgentTue, 05 May 2026 00:00:00 GMTMulti-AgentRethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systemshttp://arxiv.org/abs/2605.04018v1http://arxiv.org/abs/2605.04018v1Yilun Zhao et al. — arxiv:2605.04018 — Multi-AgentTue, 05 May 2026 00:00:00 GMTMulti-AgentPhysics-Grounded Multi-Agent Architecture for Traceable, Risk-Aware Human-AI Decision Support in Manufacturinghttp://arxiv.org/abs/2605.04003v1http://arxiv.org/abs/2605.04003v1Danny Hoang et al. — arxiv:2605.04003 — Multi-AgentTue, 05 May 2026 00:00:00 GMTMulti-AgentAn Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestrationhttp://arxiv.org/abs/2605.03989v1http://arxiv.org/abs/2605.03989v1Dutao Zhang et al. — arxiv:2605.03989 — Multi-AgentTue, 05 May 2026 00:00:00 GMTMulti-AgentFrom Intent to Execution: Composing Agentic Workflows with Agent Recommendationhttp://arxiv.org/abs/2605.03986v1http://arxiv.org/abs/2605.03986v1Kishan Athrey et al. — arxiv:2605.03986 — Multi-AgentTue, 05 May 2026 00:00:00 GMTMulti-AgentContextual Multi-Objective Optimization: Rethinking Objectives in Frontier AI Systemshttp://arxiv.org/abs/2605.03900v1http://arxiv.org/abs/2605.03900v1Jie Zhou et al. — arxiv:2605.03900 — Multi-AgentTue, 05 May 2026 00:00:00 GMTMulti-AgentQKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMshttp://arxiv.org/abs/2605.03884v1http://arxiv.org/abs/2605.03884v1Pratik Honavar et al. — arxiv:2605.03884 — Multi-AgentTue, 05 May 2026 00:00:00 GMTMulti-AgentMechanical Conscience: A Mathematical Framework for Dependability of Machine Intelligenchttp://arxiv.org/abs/2605.03847v1http://arxiv.org/abs/2605.03847v1Munkhdegerekh Batzorig et al. — arxiv:2605.03847 — Multi-AgentTue, 05 May 2026 00:00:00 GMTMulti-AgentSOAR: Real-Time Joint Optimization of Order Allocation and Robot Scheduling in Robotic Mobile Fulfillment Systemshttp://arxiv.org/abs/2605.03842v1http://arxiv.org/abs/2605.03842v1Yibang Tang et al. — arxiv:2605.03842 — Multi-AgentTue, 05 May 2026 00:00:00 GMTMulti-AgentTRACE: A Metrologically-Grounded Engineering Framework for Trustworthy Agentic AI Systems in Operationally Critical Domainshttp://arxiv.org/abs/2605.03838v1http://arxiv.org/abs/2605.03838v1Serhii Zabolotnii et al. — arxiv:2605.03838 — Multi-AgentTue, 05 May 2026 00:00:00 GMTMulti-AgentSafety and accuracy follow different scaling laws in clinical large language modelshttp://arxiv.org/abs/2605.04039v1http://arxiv.org/abs/2605.04039v1Sebastian Wind et al. — arxiv:2605.04039 — RAGTue, 05 May 2026 00:00:00 GMTRAGAn Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestrationhttp://arxiv.org/abs/2605.03989v1http://arxiv.org/abs/2605.03989v1Dutao Zhang et al. — arxiv:2605.03989 — RAGTue, 05 May 2026 00:00:00 GMTRAGBeyond Rules: LLM-Powered Linting for Quantum Programshttp://arxiv.org/abs/2605.03943v1http://arxiv.org/abs/2605.03943v1Pietro Cassieri et al. — arxiv:2605.03943 — RAGTue, 05 May 2026 00:00:00 GMTRAGNatural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHFhttp://arxiv.org/abs/2605.03799v1http://arxiv.org/abs/2605.03799v1Mullosharaf K. Arabov et al. — arxiv:2605.03799 — RAGTue, 05 May 2026 00:00:00 GMTRAGEnhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.03790v1http://arxiv.org/abs/2605.03790v1Quanxing Xu et al. — arxiv:2605.03790 — RAGTue, 05 May 2026 00:00:00 GMTRAGDeep Graph-Language Fusion for Structure-Aware Code Generationhttp://arxiv.org/abs/2605.03689v1http://arxiv.org/abs/2605.03689v1Mert Tiftikci et al. — arxiv:2605.03689 — RAGTue, 05 May 2026 00:00:00 GMTRAGMEMTIER: Tiered Memory Architecture and Retrieval Bottleneck Analysis for Long-Running Autonomous AI Agentshttp://arxiv.org/abs/2605.03675v1http://arxiv.org/abs/2605.03675v1Bronislav Sidik et al. — arxiv:2605.03675 — RAGTue, 05 May 2026 00:00:00 GMTRAGSURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.03534v1http://arxiv.org/abs/2605.03534v1Jingxi Qiu et al. — arxiv:2605.03534 — RAGTue, 05 May 2026 00:00:00 GMTRAGFrom prompting to evidence-based translation: A RAG+prompt system for Japanese-Chinese translation and its pedagogical potentialhttp://arxiv.org/abs/2605.03387v1http://arxiv.org/abs/2605.03387v1Wenshi Gu et al. — arxiv:2605.03387 — RAGTue, 05 May 2026 00:00:00 GMTRAGRAG over Thinking Traces Can Improve Reasoning Taskshttp://arxiv.org/abs/2605.03344v1http://arxiv.org/abs/2605.03344v1Negar Arabzadeh et al. — arxiv:2605.03344 — RAGTue, 05 May 2026 00:00:00 GMTRAGLarge Language Models are Universal Reasoners for Visual Generationhttp://arxiv.org/abs/2605.04040v1http://arxiv.org/abs/2605.04040v1Sucheng Ren et al. — arxiv:2605.04040 — ReasoningTue, 05 May 2026 00:00:00 GMTReasoningEQUITRIAGE: A Fairness Audit of Gender Bias in LLM-Based Emergency Department Triagehttp://arxiv.org/abs/2605.03998v1http://arxiv.org/abs/2605.03998v1Richard J. Young et al. — arxiv:2605.03998 — ReasoningTue, 05 May 2026 00:00:00 GMTReasoningBeyond Rules: LLM-Powered Linting for Quantum Programshttp://arxiv.org/abs/2605.03943v1http://arxiv.org/abs/2605.03943v1Pietro Cassieri et al. — arxiv:2605.03943 — ReasoningTue, 05 May 2026 00:00:00 GMTReasoningEnhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.03790v1http://arxiv.org/abs/2605.03790v1Quanxing Xu et al. — arxiv:2605.03790 — ReasoningTue, 05 May 2026 00:00:00 GMTReasoningSay the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Droneshttp://arxiv.org/abs/2605.03788v1http://arxiv.org/abs/2605.03788v1Andrea Iannoli et al. — arxiv:2605.03788 — ReasoningTue, 05 May 2026 00:00:00 GMTReasoningRose-SQL: Role-State Evolution Guided Structured Reasoning for Multi-Turn Text-to-SQLhttp://arxiv.org/abs/2605.03720v1http://arxiv.org/abs/2605.03720v1Le Zhou et al. — arxiv:2605.03720 — ReasoningTue, 05 May 2026 00:00:00 GMTReasoningAgenticPosesRanker: An Agentic AI Framework for Physically Grounded Ranking of Protein-Ligand Docking Poseshttp://arxiv.org/abs/2605.03707v1http://arxiv.org/abs/2605.03707v1Sofiene Khiari et al. — arxiv:2605.03707 — ReasoningTue, 05 May 2026 00:00:00 GMTReasoningBIT.UA-AAUBS at ArchEHR-QA 2026: Evaluating Open-Source and Proprietary LLMs via Prompting in Low-Resource QAhttp://arxiv.org/abs/2605.03618v1http://arxiv.org/abs/2605.03618v1Richard A. A. Jonker et al. — arxiv:2605.03618 — ReasoningTue, 05 May 2026 00:00:00 GMTReasoningFinSTaR: Towards Financial Reasoning with Time Series Reasoning Modelshttp://arxiv.org/abs/2605.03460v1http://arxiv.org/abs/2605.03460v1Seunghan Lee et al. — arxiv:2605.03460 — ReasoningTue, 05 May 2026 00:00:00 GMTReasoningDGPO: Distribution Guided Policy Optimization for Fine Grained Credit Assignmenthttp://arxiv.org/abs/2605.03327v1http://arxiv.org/abs/2605.03327v1Hongbo Jin et al. — arxiv:2605.03327 — ReasoningTue, 05 May 2026 00:00:00 GMTReasoningContextual Multi-Objective Optimization: Rethinking Objectives in Frontier AI Systemshttp://arxiv.org/abs/2605.03900v1http://arxiv.org/abs/2605.03900v1Jie Zhou et al. — arxiv:2605.03900 — Tool UseTue, 05 May 2026 00:00:00 GMTTool UseGeoDecider: A Coarse-to-Fine Agentic Workflow for Explainable Lithology Classificationhttp://arxiv.org/abs/2605.03383v1http://arxiv.org/abs/2605.03383v1Jiahao Wang et al. — arxiv:2605.03383 — Tool UseTue, 05 May 2026 00:00:00 GMTTool UseARGUS: Defending LLM Agents Against Context-Aware Prompt Injectionhttp://arxiv.org/abs/2605.03378v1http://arxiv.org/abs/2605.03378v1Shihao Weng et al. — arxiv:2605.03378 — Tool UseTue, 05 May 2026 00:00:00 GMTTool UseRevisiting the Travel Planning Capabilities of Large Language Modelshttp://arxiv.org/abs/2605.03308v1http://arxiv.org/abs/2605.03308v1Bo-Wen Zhang et al. — arxiv:2605.03308 — Tool UseTue, 05 May 2026 00:00:00 GMTTool UseEnhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarioshttp://arxiv.org/abs/2605.03242v1http://arxiv.org/abs/2605.03242v1Zuoyu Zhang et al. — arxiv:2605.03242 — Tool UseTue, 05 May 2026 00:00:00 GMTTool UseOverview of the New Hubble Spectroscopic Legacy Archivehttp://arxiv.org/abs/2605.04167v1http://arxiv.org/abs/2605.04167v1Ravi Sankrit et al. — arxiv:2605.04167 — Tool UseTue, 05 May 2026 00:00:00 GMTTool UseFrontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluationhttp://arxiv.org/abs/2605.04135v1http://arxiv.org/abs/2605.04135v1David Gringras et al. — arxiv:2605.04135 — Tool UseTue, 05 May 2026 00:00:00 GMTTool UseStateVLM: A State-Aware Vision-Language Model for Robotic Affordance Reasoninghttp://arxiv.org/abs/2605.03927v1http://arxiv.org/abs/2605.03927v1Xiaowen Sun et al. — arxiv:2605.03927 — Multimodal LLMTue, 05 May 2026 00:00:00 GMTMultimodal LLMQuantifying the human visual exposome with vision language modelshttp://arxiv.org/abs/2605.03863v1http://arxiv.org/abs/2605.03863v1Christian Rominger et al. — arxiv:2605.03863 — Multimodal LLMTue, 05 May 2026 00:00:00 GMTMultimodal LLMRoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Modelshttp://arxiv.org/abs/2605.03821v1http://arxiv.org/abs/2605.03821v1Hao Wu et al. — arxiv:2605.03821 — Multimodal LLMTue, 05 May 2026 00:00:00 GMTMultimodal LLMScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgettinghttp://arxiv.org/abs/2605.03804v1http://arxiv.org/abs/2605.03804v1Jiale Chang et al. — arxiv:2605.03804 — Multimodal LLMTue, 05 May 2026 00:00:00 GMTMultimodal LLMEnhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.03790v1http://arxiv.org/abs/2605.03790v1Quanxing Xu et al. — arxiv:2605.03790 — Multimodal LLMTue, 05 May 2026 00:00:00 GMTMultimodal LLMWhat You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosityhttp://arxiv.org/abs/2605.03782v1http://arxiv.org/abs/2605.03782v1Haoxi Li et al. — arxiv:2605.03782 — Multimodal LLMTue, 05 May 2026 00:00:00 GMTMultimodal LLMBefore Forgetting, Learn to Remember: Revisiting Foundational Learning Failures in LVLM Unlearning Benchmarkshttp://arxiv.org/abs/2605.03759v1http://arxiv.org/abs/2605.03759v1JuneHyoung Kwon et al. — arxiv:2605.03759 — Multimodal LLMTue, 05 May 2026 00:00:00 GMTMultimodal LLMUni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipehttp://arxiv.org/abs/2605.03677v1http://arxiv.org/abs/2605.03677v1Wenjin Hou et al. — arxiv:2605.03677 — Multimodal LLMTue, 05 May 2026 00:00:00 GMTMultimodal LLMThe Detector Teaches Itself: Lightweight Self-Supervised Adaptation for Open-Vocabulary Object Detectionhttp://arxiv.org/abs/2605.03642v1http://arxiv.org/abs/2605.03642v1Yazhe Wan et al. — arxiv:2605.03642 — Multimodal LLMTue, 05 May 2026 00:00:00 GMTMultimodal LLMErase Persona, Forget Lore: Benchmarking Multimodal Copyright Unlearning in Large Vision Language Modelshttp://arxiv.org/abs/2605.03547v1http://arxiv.org/abs/2605.03547v1JuneHyoung Kwon et al. — arxiv:2605.03547 — Multimodal LLMTue, 05 May 2026 00:00:00 GMTMultimodal LLMContextual Multi-Objective Optimization: Rethinking Objectives in Frontier AI Systemshttp://arxiv.org/abs/2605.03900v1http://arxiv.org/abs/2605.03900v1Jie Zhou et al. — arxiv:2605.03900 — Long ContextTue, 05 May 2026 00:00:00 GMTLong ContextRoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Modelshttp://arxiv.org/abs/2605.03821v1http://arxiv.org/abs/2605.03821v1Hao Wu et al. — arxiv:2605.03821 — Long ContextTue, 05 May 2026 00:00:00 GMTLong ContextSay the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Droneshttp://arxiv.org/abs/2605.03788v1http://arxiv.org/abs/2605.03788v1Andrea Iannoli et al. — arxiv:2605.03788 — Long ContextTue, 05 May 2026 00:00:00 GMTLong ContextRose-SQL: Role-State Evolution Guided Structured Reasoning for Multi-Turn Text-to-SQLhttp://arxiv.org/abs/2605.03720v1http://arxiv.org/abs/2605.03720v1Le Zhou et al. — arxiv:2605.03720 — Long ContextTue, 05 May 2026 00:00:00 GMTLong ContextMEMTIER: Tiered Memory Architecture and Retrieval Bottleneck Analysis for Long-Running Autonomous AI Agentshttp://arxiv.org/abs/2605.03675v1http://arxiv.org/abs/2605.03675v1Bronislav Sidik et al. — arxiv:2605.03675 — Long ContextTue, 05 May 2026 00:00:00 GMTLong ContextAdapShot: Adaptive Many-Shot In-Context Learning with Semantic-Aware KV Cache Reusehttp://arxiv.org/abs/2605.03644v1http://arxiv.org/abs/2605.03644v1Jie Ou et al. — arxiv:2605.03644 — Long ContextTue, 05 May 2026 00:00:00 GMTLong ContextTutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Servinghttp://arxiv.org/abs/2605.03375v1http://arxiv.org/abs/2605.03375v1Shi Qiu et al. — arxiv:2605.03375 — Long ContextTue, 05 May 2026 00:00:00 GMTLong ContextMemFlow: Intent-Driven Memory Orchestration for Small Language Model Agentshttp://arxiv.org/abs/2605.03312v1http://arxiv.org/abs/2605.03312v1Jiayi Chen et al. — arxiv:2605.03312 — Long ContextTue, 05 May 2026 00:00:00 GMTLong ContextRevisiting the Travel Planning Capabilities of Large Language Modelshttp://arxiv.org/abs/2605.03308v1http://arxiv.org/abs/2605.03308v1Bo-Wen Zhang et al. — arxiv:2605.03308 — Long ContextTue, 05 May 2026 00:00:00 GMTLong ContextExploring Sustainability in Scientific Software through Code Quality & Test Coverage Metricshttp://arxiv.org/abs/2605.03243v1http://arxiv.org/abs/2605.03243v1Sheikh Md. Mushfiqur Rahman et al. — arxiv:2605.03243 — Long ContextTue, 05 May 2026 00:00:00 GMTLong ContextRethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systemshttp://arxiv.org/abs/2605.04018v1http://arxiv.org/abs/2605.04018v1Yilun Zhao et al. — arxiv:2605.04018 — LLM EfficiencyTue, 05 May 2026 00:00:00 GMTLLM EfficiencyNonlinear Compton scattering in a frequency-modulated fieldhttp://arxiv.org/abs/2605.04011v1http://arxiv.org/abs/2605.04011v1Antonino Di Piazza et al. — arxiv:2605.04011 — LLM EfficiencyTue, 05 May 2026 00:00:00 GMTLLM EfficiencyRD-ViT: Recurrent-Depth Vision Transformer for Semantic Segmentation with Reduced Data Dependence Extending the Recurrent-Depth Transformer Architecture to Dense Predictionhttp://arxiv.org/abs/2605.03999v1http://arxiv.org/abs/2605.03999v1Renjie He et al. — arxiv:2605.03999 — LLM EfficiencyTue, 05 May 2026 00:00:00 GMTLLM EfficiencyQKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMshttp://arxiv.org/abs/2605.03884v1http://arxiv.org/abs/2605.03884v1Pratik Honavar et al. — arxiv:2605.03884 — LLM EfficiencyTue, 05 May 2026 00:00:00 GMTLLM EfficiencyPath integral quantization of the electromagnetic field in nonlinear dielectric materialshttp://arxiv.org/abs/2605.03836v1http://arxiv.org/abs/2605.03836v1Arman Kashef et al. — arxiv:2605.03836 — LLM EfficiencyTue, 05 May 2026 00:00:00 GMTLLM EfficiencyA density-matrix derivation of the Hartree--Fock equations in a nonorthogonal atomic-orbital basishttp://arxiv.org/abs/2605.03761v1http://arxiv.org/abs/2605.03761v1Thomas Kjærgaard et al. — arxiv:2605.03761 — LLM EfficiencyTue, 05 May 2026 00:00:00 GMTLLM EfficiencyBenchmarking Parameter-Efficient Fine-Tuning of Large Language Models for Low-Resource Tajik Text Generation with the Tajik Web Corpushttp://arxiv.org/abs/2605.03742v1http://arxiv.org/abs/2605.03742v1Mullosharaf K. Arabov et al. — arxiv:2605.03742 — LLM EfficiencyTue, 05 May 2026 00:00:00 GMTLLM EfficiencyRethinking the Rank Threshold for LoRA Fine-Tuninghttp://arxiv.org/abs/2605.03724v1http://arxiv.org/abs/2605.03724v1Juneyoung Park et al. — arxiv:2605.03724 — LLM EfficiencyTue, 05 May 2026 00:00:00 GMTLLM EfficiencyFrom Code to Prediction: Fine-Tuning LLMs for Neural Network Performance Classification in NNGPThttp://arxiv.org/abs/2605.03686v1http://arxiv.org/abs/2605.03686v1Mahmoud Hanouneh et al. — arxiv:2605.03686 — LLM EfficiencyTue, 05 May 2026 00:00:00 GMTLLM EfficiencyPriorNet: Prior-Guided Engagement Estimation from Face Videohttp://arxiv.org/abs/2605.03615v1http://arxiv.org/abs/2605.03615v1Alexander Vedernikov et al. — arxiv:2605.03615 — LLM EfficiencyTue, 05 May 2026 00:00:00 GMTLLM EfficiencyStayin' Aligned Over Time: Towards Longitudinal Human-LLM Alignment via Contextual Reflection and Privacy-Preserving Behavioral Datahttp://arxiv.org/abs/2605.04029v1http://arxiv.org/abs/2605.04029v1Simret Araya Gebreegziabher et al. — arxiv:2605.04029 — AlignmentTue, 05 May 2026 00:00:00 GMTAlignmentNatural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHFhttp://arxiv.org/abs/2605.03799v1http://arxiv.org/abs/2605.03799v1Mullosharaf K. Arabov et al. — arxiv:2605.03799 — AlignmentTue, 05 May 2026 00:00:00 GMTAlignmentQUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimizationhttp://arxiv.org/abs/2605.04267v1http://arxiv.org/abs/2605.04267v1Florian A. D. Burnat et al. — arxiv:2605.04267 — AlignmentTue, 05 May 2026 00:00:00 GMTAlignmentExplaining and Preventing Alignment Collapse in Iterative RLHFhttp://arxiv.org/abs/2605.04266v1http://arxiv.org/abs/2605.04266v1Etienne Gauthier et al. — arxiv:2605.04266 — AlignmentTue, 05 May 2026 00:00:00 GMTAlignmentSelf-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extractionhttp://arxiv.org/abs/2605.04221v1http://arxiv.org/abs/2605.04221v1Yao-Shun Chuang et al. — arxiv:2605.04221 — AlignmentTue, 05 May 2026 00:00:00 GMTAlignmentLarge Language Models are Universal Reasoners for Visual Generationhttp://arxiv.org/abs/2605.04040v1http://arxiv.org/abs/2605.04040v1Sucheng Ren et al. — arxiv:2605.04040 — HallucinationTue, 05 May 2026 00:00:00 GMTHallucinationLogical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgmentshttp://arxiv.org/abs/2605.03971v1http://arxiv.org/abs/2605.03971v1Hao Mi et al. — arxiv:2605.03971 — HallucinationTue, 05 May 2026 00:00:00 GMTHallucinationAn extensive theory of nonlinearly intercoupled pseudomodes for noise model reduction in circuit QEDhttp://arxiv.org/abs/2605.03946v1http://arxiv.org/abs/2605.03946v1M. Gabriela Boada G. et al. — arxiv:2605.03946 — HallucinationTue, 05 May 2026 00:00:00 GMTHallucinationSteer Like the LLM: Activation Steering that Mimics Promptinghttp://arxiv.org/abs/2605.03907v1http://arxiv.org/abs/2605.03907v1Geert Heyman et al. — arxiv:2605.03907 — HallucinationTue, 05 May 2026 00:00:00 GMTHallucinationDeco: Extending Personal Physical Objects into Pervasive AI Companion through a Dual-Embodiment Frameworkhttp://arxiv.org/abs/2605.03882v1http://arxiv.org/abs/2605.03882v1Zhihan Jiang et al. — arxiv:2605.03882 — HallucinationTue, 05 May 2026 00:00:00 GMTHallucinationCorrect Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewardshttp://arxiv.org/abs/2605.03862v1http://arxiv.org/abs/2605.03862v1Tianyang Han et al. — arxiv:2605.03862 — HallucinationTue, 05 May 2026 00:00:00 GMTHallucinationTriBench-Ko: Evaluating LLM Risks in Judicial Workflowshttp://arxiv.org/abs/2605.03792v1http://arxiv.org/abs/2605.03792v1Haesung Lee et al. — arxiv:2605.03792 — HallucinationTue, 05 May 2026 00:00:00 GMTHallucinationGeoTopoDiff: Learning Geometry--Topology Graph Priors through Boundary-Constrained Mixed Diffusion for Sparse-Slice 3D Porous Reconstructionhttp://arxiv.org/abs/2605.03764v1http://arxiv.org/abs/2605.03764v1Yue Shi et al. — arxiv:2605.03764 — HallucinationTue, 05 May 2026 00:00:00 GMTHallucinationFluxFlow: Conservative Flow-Matching for Astronomical Image Super-Resolutionhttp://arxiv.org/abs/2605.03749v1http://arxiv.org/abs/2605.03749v1Shuhong Liu et al. — arxiv:2605.03749 — HallucinationTue, 05 May 2026 00:00:00 GMTHallucinationSERE: Structural Example Retrieval for Enhancing LLMs in Event Causality Identificationhttp://arxiv.org/abs/2605.03701v1http://arxiv.org/abs/2605.03701v1Zhifeng Hao et al. — arxiv:2605.03701 — HallucinationTue, 05 May 2026 00:00:00 GMTHallucinationSafety and accuracy follow different scaling laws in clinical large language modelshttp://arxiv.org/abs/2605.04039v1http://arxiv.org/abs/2605.04039v1Sebastian Wind et al. — arxiv:2605.04039 — LLM SafetyTue, 05 May 2026 00:00:00 GMTLLM SafetyRedefining AI Red Teaming in the Agentic Era: From Weeks to Hourshttp://arxiv.org/abs/2605.04019v1http://arxiv.org/abs/2605.04019v1Raja Sekhar Rao Dheekonda et al. — arxiv:2605.04019 — LLM SafetyTue, 05 May 2026 00:00:00 GMTLLM SafetyReal-Time Evaluation of Autonomous Systems under Adversarial Attackshttp://arxiv.org/abs/2605.03491v1http://arxiv.org/abs/2605.03491v1Adithya Mohan et al. — arxiv:2605.03491 — LLM SafetyTue, 05 May 2026 00:00:00 GMTLLM SafetyExposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysishttp://arxiv.org/abs/2605.03441v1http://arxiv.org/abs/2605.03441v1Haoyu Zhang et al. — arxiv:2605.03441 — LLM SafetyTue, 05 May 2026 00:00:00 GMTLLM SafetyTsallisPGD: Adaptive Gradient Weighting for Adversarial Attacks on Semantic Segmentationhttp://arxiv.org/abs/2605.03405v1http://arxiv.org/abs/2605.03405v1Alexander Matyasko et al. — arxiv:2605.03405 — LLM SafetyTue, 05 May 2026 00:00:00 GMTLLM SafetyEnhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarioshttp://arxiv.org/abs/2605.03242v1http://arxiv.org/abs/2605.03242v1Zuoyu Zhang et al. — arxiv:2605.03242 — LLM SafetyTue, 05 May 2026 00:00:00 GMTLLM SafetyLaundering AI Authority with Adversarial Exampleshttp://arxiv.org/abs/2605.04261v1http://arxiv.org/abs/2605.04261v1Jie Zhang et al. — arxiv:2605.04261 — LLM SafetyTue, 05 May 2026 00:00:00 GMTLLM SafetyMCJudgeBench: A Benchmark for Constraint-Level Judge Evaluation in Multi-Constraint Instruction Followinghttp://arxiv.org/abs/2605.03858v1http://arxiv.org/abs/2605.03858v1Jaeyun Lee et al. — arxiv:2605.03858 — LLM EvaluationTue, 05 May 2026 00:00:00 GMTLLM EvaluationEvaluating Generative Models as Interactive Emergent Representations of Human-Like Collaborative Behaviorhttp://arxiv.org/abs/2605.03855v1http://arxiv.org/abs/2605.03855v1Shinas Shaji et al. — arxiv:2605.03855 — LLM EvaluationTue, 05 May 2026 00:00:00 GMTLLM EvaluationBIT.UA-AAUBS at ArchEHR-QA 2026: Evaluating Open-Source and Proprietary LLMs via Prompting in Low-Resource QAhttp://arxiv.org/abs/2605.03618v1http://arxiv.org/abs/2605.03618v1Richard A. A. Jonker et al. — arxiv:2605.03618 — LLM EvaluationTue, 05 May 2026 00:00:00 GMTLLM EvaluationDetecting Stealth Sycophancy in Mental-Health Dialogue with Dynamic Emotional Signature Graphshttp://arxiv.org/abs/2605.03472v1http://arxiv.org/abs/2605.03472v1Tianze Han et al. — arxiv:2605.03472 — LLM EvaluationTue, 05 May 2026 00:00:00 GMTLLM EvaluationLLM-ADAM: A Generalizable LLM Agent Framework for Pre-Print Anomaly Detection in Additive Manufacturinghttp://arxiv.org/abs/2605.03328v1http://arxiv.org/abs/2605.03328v1Ahmadreza Eslaminia et al. — arxiv:2605.03328 — LLM EvaluationTue, 05 May 2026 00:00:00 GMTLLM EvaluationEnhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarioshttp://arxiv.org/abs/2605.03242v1http://arxiv.org/abs/2605.03242v1Zuoyu Zhang et al. — arxiv:2605.03242 — LLM EvaluationTue, 05 May 2026 00:00:00 GMTLLM EvaluationFlowEval: Reference-based Evaluation of Generated User Interfaceshttp://arxiv.org/abs/2605.04165v1http://arxiv.org/abs/2605.04165v1Jason Wu et al. — arxiv:2605.04165 — LLM EvaluationTue, 05 May 2026 00:00:00 GMTLLM EvaluationContextual Multi-Objective Optimization: Rethinking Objectives in Frontier AI Systemshttp://arxiv.org/abs/2605.03900v1http://arxiv.org/abs/2605.03900v1Jie Zhou et al. — arxiv:2605.03900 — Code LLMTue, 05 May 2026 00:00:00 GMTCode LLMSay the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Droneshttp://arxiv.org/abs/2605.03788v1http://arxiv.org/abs/2605.03788v1Andrea Iannoli et al. — arxiv:2605.03788 — Code LLMTue, 05 May 2026 00:00:00 GMTCode LLMRose-SQL: Role-State Evolution Guided Structured Reasoning for Multi-Turn Text-to-SQLhttp://arxiv.org/abs/2605.03720v1http://arxiv.org/abs/2605.03720v1Le Zhou et al. — arxiv:2605.03720 — Code LLMTue, 05 May 2026 00:00:00 GMTCode LLMDeep Graph-Language Fusion for Structure-Aware Code Generationhttp://arxiv.org/abs/2605.03689v1http://arxiv.org/abs/2605.03689v1Mert Tiftikci et al. — arxiv:2605.03689 — Code LLMTue, 05 May 2026 00:00:00 GMTCode LLMFrom Code to Prediction: Fine-Tuning LLMs for Neural Network Performance Classification in NNGPThttp://arxiv.org/abs/2605.03686v2http://arxiv.org/abs/2605.03686v2Mahmoud Hanouneh et al. — arxiv:2605.03686 — Code LLMTue, 05 May 2026 00:00:00 GMTCode LLMRAG over Thinking Traces Can Improve Reasoning Taskshttp://arxiv.org/abs/2605.03344v1http://arxiv.org/abs/2605.03344v1Negar Arabzadeh et al. — arxiv:2605.03344 — Code LLMTue, 05 May 2026 00:00:00 GMTCode LLMMedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMshttp://arxiv.org/abs/2605.04180v1http://arxiv.org/abs/2605.04180v1Tung Sum Thomas Kwok et al. — arxiv:2605.04180 — Medical NLPTue, 05 May 2026 00:00:00 GMTMedical NLPNatural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHFhttp://arxiv.org/abs/2605.03799v1http://arxiv.org/abs/2605.03799v1Mullosharaf K. Arabov et al. — arxiv:2605.03799 — Multilingual NLPTue, 05 May 2026 00:00:00 GMTMultilingual NLPBenchmarking Parameter-Efficient Fine-Tuning of Large Language Models for Low-Resource Tajik Text Generation with the Tajik Web Corpushttp://arxiv.org/abs/2605.03742v1http://arxiv.org/abs/2605.03742v1Mullosharaf K. Arabov et al. — arxiv:2605.03742 — Multilingual NLPTue, 05 May 2026 00:00:00 GMTMultilingual NLPLLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Modelshttp://arxiv.org/abs/2605.03299v1http://arxiv.org/abs/2605.03299v1Minh Chu Xuan et al. — arxiv:2605.03299 — Multilingual NLPTue, 05 May 2026 00:00:00 GMTMultilingual NLPSAM-NER: Semantic Archetype Mediation for Zero-Shot Named Entity Recognitionhttp://arxiv.org/abs/2605.03706v1http://arxiv.org/abs/2605.03706v1Ruichu Cai et al. — arxiv:2605.03706 — Named Entity RecognitionTue, 05 May 2026 00:00:00 GMTNamed Entity RecognitionGeolocating News about Extreme Climate Events: A Comparative Analysis of Off-the-Shelf Tools for Toponym Identification in Germanhttp://arxiv.org/abs/2605.03414v1http://arxiv.org/abs/2605.03414v1Brielen Madureira et al. — arxiv:2605.03414 — Named Entity RecognitionTue, 05 May 2026 00:00:00 GMTNamed Entity RecognitionSelf-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extractionhttp://arxiv.org/abs/2605.04221v1http://arxiv.org/abs/2605.04221v1Yao-Shun Chuang et al. — arxiv:2605.04221 — Named Entity RecognitionTue, 05 May 2026 00:00:00 GMTNamed Entity RecognitionCC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processinghttp://arxiv.org/abs/2605.03903v1http://arxiv.org/abs/2605.03903v1Zhipeng Xu et al. — arxiv:2605.03903 — Information ExtractionTue, 05 May 2026 00:00:00 GMTInformation ExtractionGeolocating News about Extreme Climate Events: A Comparative Analysis of Off-the-Shelf Tools for Toponym Identification in Germanhttp://arxiv.org/abs/2605.03414v1http://arxiv.org/abs/2605.03414v1Brielen Madureira et al. — arxiv:2605.03414 — Information ExtractionTue, 05 May 2026 00:00:00 GMTInformation ExtractionMaterial Database Agent: A Multimodal Agentic Framework for Scientific Literature Mininghttp://arxiv.org/abs/2605.04278v1http://arxiv.org/abs/2605.04278v1Achuth Chandrasekhar et al. — arxiv:2605.04278 — Information ExtractionTue, 05 May 2026 00:00:00 GMTInformation ExtractionSelf-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extractionhttp://arxiv.org/abs/2605.04221v1http://arxiv.org/abs/2605.04221v1Yao-Shun Chuang et al. — arxiv:2605.04221 — Information ExtractionTue, 05 May 2026 00:00:00 GMTInformation ExtractionAn Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestrationhttp://arxiv.org/abs/2605.03989v1http://arxiv.org/abs/2605.03989v1Dutao Zhang et al. — arxiv:2605.03989 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringMagic-Informed Quantum Architecture Searchhttp://arxiv.org/abs/2605.03932v1http://arxiv.org/abs/2605.03932v1Vincenzo Lipardi et al. — arxiv:2605.03932 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringCC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processinghttp://arxiv.org/abs/2605.03903v1http://arxiv.org/abs/2605.03903v1Zhipeng Xu et al. — arxiv:2605.03903 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringEnhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.03790v1http://arxiv.org/abs/2605.03790v1Quanxing Xu et al. — arxiv:2605.03790 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringBefore Forgetting, Learn to Remember: Revisiting Foundational Learning Failures in LVLM Unlearning Benchmarkshttp://arxiv.org/abs/2605.03759v1http://arxiv.org/abs/2605.03759v1JuneHyoung Kwon et al. — arxiv:2605.03759 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringGeographic Variation in Stack Overflow Code Quality: Evidence from a Cross-Regional Study of Coding Practiceshttp://arxiv.org/abs/2605.03670v1http://arxiv.org/abs/2605.03670v1Elijah Zolduoarrati et al. — arxiv:2605.03670 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringBIT.UA-AAUBS at ArchEHR-QA 2026: Evaluating Open-Source and Proprietary LLMs via Prompting in Low-Resource QAhttp://arxiv.org/abs/2605.03618v1http://arxiv.org/abs/2605.03618v1Richard A. A. Jonker et al. — arxiv:2605.03618 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringDALPHIN: Benchmarking Digital Pathology AI Copilots Against Pathologists on an Open Multicentric Datasethttp://arxiv.org/abs/2605.03544v1http://arxiv.org/abs/2605.03544v1Carlijn Lems et al. — arxiv:2605.03544 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringWorldJen: An End-to-End Multi-Dimensional Benchmark for Generative Video Modelshttp://arxiv.org/abs/2605.03475v1http://arxiv.org/abs/2605.03475v1Karthik Inbasekar et al. — arxiv:2605.03475 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringVEBench:Benchmarking Large Multimodal Models for Real-World Video Editinghttp://arxiv.org/abs/2605.03276v1http://arxiv.org/abs/2605.03276v1Andong Deng et al. — arxiv:2605.03276 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringHierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoninghttp://arxiv.org/abs/2605.04304v1http://arxiv.org/abs/2605.04304v1Qihua Dong et al. — arxiv:2605.04304 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringTemporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QAhttp://arxiv.org/abs/2605.04243v1http://arxiv.org/abs/2605.04243v1Tran Quang Liem et al. — arxiv:2605.04243 — Question AnsweringTue, 05 May 2026 00:00:00 GMTQuestion AnsweringAnnotation Quality in Aspect-Based Sentiment Analysis: A Case Study Comparing Experts, Students, Crowdworkers, and Large Language Modelhttp://arxiv.org/abs/2605.03624v1http://arxiv.org/abs/2605.03624v1Niklas Donhauser et al. — arxiv:2605.03624 — Sentiment AnalysisTue, 05 May 2026 00:00:00 GMTSentiment AnalysisSentiment Analysis of Indonesian Spotify Reviews Using Machine Learning and BiLSTMhttp://arxiv.org/abs/2605.03443v1http://arxiv.org/abs/2605.03443v1Uliano Wilyam Purba et al. — arxiv:2605.03443 — Sentiment AnalysisTue, 05 May 2026 00:00:00 GMTSentiment AnalysisA Comparison of Traditional Machine Learning Algorithms and LSTM-Based Deep Learning Models for Email Sentiment Analysishttp://arxiv.org/abs/2605.03440v1http://arxiv.org/abs/2605.03440v1Virdio Samuel Saragih et al. — arxiv:2605.03440 — Sentiment AnalysisTue, 05 May 2026 00:00:00 GMTSentiment AnalysisBenchmarking Logistic Regression, SVM, Naive Bayes, and IndoBERT Fine-Tuning for Sentiment Analysis on Indonesian Product Reviewshttp://arxiv.org/abs/2605.03439v1http://arxiv.org/abs/2605.03439v1Nabila Zakiyah Zahra et al. — arxiv:2605.03439 — Sentiment AnalysisTue, 05 May 2026 00:00:00 GMTSentiment AnalysisScience discussions of retracted articles on Bluesky: public scrutiny or misinformation spreading?http://arxiv.org/abs/2605.04334v1http://arxiv.org/abs/2605.04334v1Er-Te Zheng et al. — arxiv:2605.04334 — Sentiment AnalysisTue, 05 May 2026 00:00:00 GMTSentiment AnalysisOpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectorieshttp://arxiv.org/abs/2605.04036v1http://arxiv.org/abs/2605.04036v1Yuwen Du et al. — arxiv:2605.04036 — Knowledge GraphTue, 05 May 2026 00:00:00 GMTKnowledge GraphPhysics-Grounded Multi-Agent Architecture for Traceable, Risk-Aware Human-AI Decision Support in Manufacturinghttp://arxiv.org/abs/2605.04003v1http://arxiv.org/abs/2605.04003v1Danny Hoang et al. — arxiv:2605.04003 — Knowledge GraphTue, 05 May 2026 00:00:00 GMTKnowledge GraphConRAD: Conformal Risk-Aware Neural Databaseshttp://arxiv.org/abs/2605.03806v1http://arxiv.org/abs/2605.03806v1Sonia Horchidan et al. — arxiv:2605.03806 — Knowledge GraphTue, 05 May 2026 00:00:00 GMTKnowledge GraphGraph Neural Network based Hierarchy-Aware Embeddings of Knowledge Graphs: Applications to Yeast Phenotype Predictionhttp://arxiv.org/abs/2605.03690v1http://arxiv.org/abs/2605.03690v1Filip Kronström et al. — arxiv:2605.03690 — Knowledge GraphTue, 05 May 2026 00:00:00 GMTKnowledge GraphCuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verificationhttp://arxiv.org/abs/2605.03476v1http://arxiv.org/abs/2605.03476v1Severin Ye et al. — arxiv:2605.03476 — Knowledge GraphTue, 05 May 2026 00:00:00 GMTKnowledge GraphAcademiClaw: When Students Set Challenges for AI Agentshttp://arxiv.org/abs/2605.02661v1http://arxiv.org/abs/2605.02661v1Junjie Yu et al. — arxiv:2605.02661 — NLPMon, 04 May 2026 00:00:00 GMTNLPBeating the Style Detector: Three Hours of Agentic Research on the AI-Text Arms Racehttp://arxiv.org/abs/2605.02620v1http://arxiv.org/abs/2605.02620v1Andreas Maier et al. — arxiv:2605.02620 — NLPMon, 04 May 2026 00:00:00 GMTNLPSemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultureshttp://arxiv.org/abs/2605.02601v1http://arxiv.org/abs/2605.02601v1Nedjma Ousidhoum et al. — arxiv:2605.02601 — NLPMon, 04 May 2026 00:00:00 GMTNLPRevisiting Semantic Role Labeling: Efficient Structured Inference with Dependency-Informed Analysishttp://arxiv.org/abs/2605.02505v1http://arxiv.org/abs/2605.02505v1Sangpil Youm et al. — arxiv:2605.02505 — NLPMon, 04 May 2026 00:00:00 GMTNLPShadow-Loom: Causal Reasoning over Graphical World Model of Narrativeshttp://arxiv.org/abs/2605.02475v1http://arxiv.org/abs/2605.02475v1David Wilmot et al. — arxiv:2605.02475 — NLPMon, 04 May 2026 00:00:00 GMTNLPHalluScan: A Systematic Benchmark for Detecting and Mitigating Hallucinations in Instruction-Following LLMshttp://arxiv.org/abs/2605.02443v1http://arxiv.org/abs/2605.02443v1Ahmed Cherif et al. — arxiv:2605.02443 — NLPMon, 04 May 2026 00:00:00 GMTNLPControllable and Verifiable Process Data Synthesis for Process Reward Modelshttp://arxiv.org/abs/2605.02395v1http://arxiv.org/abs/2605.02395v1Yinghui Chi et al. — arxiv:2605.02395 — NLPMon, 04 May 2026 00:00:00 GMTNLPSemantically Enriching Investor Micro-blogs for Opinion-Aware Emotion Analysis: A Practical Approachhttp://arxiv.org/abs/2605.03092v1http://arxiv.org/abs/2605.03092v1Gaurav Negi et al. — arxiv:2605.03092 — NLPMon, 04 May 2026 00:00:00 GMTNLPEvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphshttp://arxiv.org/abs/2605.02868v1http://arxiv.org/abs/2605.02868v1Ruichao Liang et al. — arxiv:2605.02868 — LLMMon, 04 May 2026 00:00:00 GMTLLMSemantic Risk-Aware Heuristic Planning for Robotic Navigation in Dynamic Environments: An LLM-Inspired Approachhttp://arxiv.org/abs/2605.02862v1http://arxiv.org/abs/2605.02862v1Hamza Ahmed Durrani et al. — arxiv:2605.02862 — LLMMon, 04 May 2026 00:00:00 GMTLLMStanding on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detectionhttp://arxiv.org/abs/2605.02860v1http://arxiv.org/abs/2605.02860v1Mohamad Khajezade et al. — arxiv:2605.02860 — LLMMon, 04 May 2026 00:00:00 GMTLLMWhen Is the Same Model Not the Same Service? A Measurement Study of Hosted Open-Weight LLM APIshttp://arxiv.org/abs/2605.02821v1http://arxiv.org/abs/2605.02821v1Haorui Li et al. — arxiv:2605.02821 — LLMMon, 04 May 2026 00:00:00 GMTLLMSCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answeringhttp://arxiv.org/abs/2605.02819v1http://arxiv.org/abs/2605.02819v1Jiujiu Chen et al. — arxiv:2605.02819 — LLMMon, 04 May 2026 00:00:00 GMTLLMAutonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defensehttp://arxiv.org/abs/2605.02812v1http://arxiv.org/abs/2605.02812v1Mingming Zha et al. — arxiv:2605.02812 — LLMMon, 04 May 2026 00:00:00 GMTLLMAIs and Humans with Agencyhttp://arxiv.org/abs/2605.02810v1http://arxiv.org/abs/2605.02810v1David Mumford et al. — arxiv:2605.02810 — LLMMon, 04 May 2026 00:00:00 GMTLLMReinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traceshttp://arxiv.org/abs/2605.02801v1http://arxiv.org/abs/2605.02801v1Chenchen Zhang et al. — arxiv:2605.02801 — LLMMon, 04 May 2026 00:00:00 GMTLLMFunFuzz: An LLM-Powered Evolutionary Fuzzing Frameworkhttp://arxiv.org/abs/2605.02789v1http://arxiv.org/abs/2605.02789v1Mario Rodríguez Béjar et al. — arxiv:2605.02789 — LLMMon, 04 May 2026 00:00:00 GMTLLMU-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planninghttp://arxiv.org/abs/2605.02765v1http://arxiv.org/abs/2605.02765v1Christine P Lee et al. — arxiv:2605.02765 — LLMMon, 04 May 2026 00:00:00 GMTLLMSpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selectionhttp://arxiv.org/abs/2605.02888v1http://arxiv.org/abs/2605.02888v1Shikhar Shukla et al. — arxiv:2605.02888 — LLMMon, 04 May 2026 00:00:00 GMTLLMEvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphshttp://arxiv.org/abs/2605.02868v1http://arxiv.org/abs/2605.02868v1Ruichao Liang et al. — arxiv:2605.02868 — LLM AgentMon, 04 May 2026 00:00:00 GMTLLM AgentUncountably many conditionally inaccessible decisions exist in every finite probability spacehttp://arxiv.org/abs/2605.02865v1http://arxiv.org/abs/2605.02865v1Zalán Gyenis et al. — arxiv:2605.02865 — LLM AgentMon, 04 May 2026 00:00:00 GMTLLM AgentHAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systemshttp://arxiv.org/abs/2605.02832v1http://arxiv.org/abs/2605.02832v1Vicente Pelechanoa et al. — arxiv:2605.02832 — LLM AgentMon, 04 May 2026 00:00:00 GMTLLM AgentEquilibrium Stability and Uniqueness with a Large Number of Commodities and Patient Consumershttp://arxiv.org/abs/2605.02817v1http://arxiv.org/abs/2605.02817v1Xinyang Wang et al. — arxiv:2605.02817 — LLM AgentMon, 04 May 2026 00:00:00 GMTLLM AgentFlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agentshttp://arxiv.org/abs/2605.02815v1http://arxiv.org/abs/2605.02815v1Quang Hieu Pham et al. — arxiv:2605.02815 — LLM AgentMon, 04 May 2026 00:00:00 GMTLLM AgentAutonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defensehttp://arxiv.org/abs/2605.02812v1http://arxiv.org/abs/2605.02812v1Mingming Zha et al. — arxiv:2605.02812 — LLM AgentMon, 04 May 2026 00:00:00 GMTLLM AgentTool Use as Action: Towards Agentic Control in Mobile Core Networkshttp://arxiv.org/abs/2605.02811v1http://arxiv.org/abs/2605.02811v1Purna Sai Garigipati et al. — arxiv:2605.02811 — LLM AgentMon, 04 May 2026 00:00:00 GMTLLM AgentReinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traceshttp://arxiv.org/abs/2605.02801v1http://arxiv.org/abs/2605.02801v1Chenchen Zhang et al. — arxiv:2605.02801 — LLM AgentMon, 04 May 2026 00:00:00 GMTLLM AgentTruthful Communication and Exclusive Information Clubshttp://arxiv.org/abs/2605.02776v1http://arxiv.org/abs/2605.02776v1Paolo Pin et al. — arxiv:2605.02776 — LLM AgentMon, 04 May 2026 00:00:00 GMTLLM AgentDynoSLAM: Dynamic SLAM with Generative Graph Neural Networks for Real-World Social Navigationhttp://arxiv.org/abs/2605.02759v1http://arxiv.org/abs/2605.02759v1Danil Tokhchukov et al. — arxiv:2605.02759 — LLM AgentMon, 04 May 2026 00:00:00 GMTLLM AgentEvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphshttp://arxiv.org/abs/2605.02868v1http://arxiv.org/abs/2605.02868v1Ruichao Liang et al. — arxiv:2605.02868 — Multi-AgentMon, 04 May 2026 00:00:00 GMTMulti-AgentAutonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defensehttp://arxiv.org/abs/2605.02812v1http://arxiv.org/abs/2605.02812v1Mingming Zha et al. — arxiv:2605.02812 — Multi-AgentMon, 04 May 2026 00:00:00 GMTMulti-AgentReinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traceshttp://arxiv.org/abs/2605.02801v1http://arxiv.org/abs/2605.02801v1Chenchen Zhang et al. — arxiv:2605.02801 — Multi-AgentMon, 04 May 2026 00:00:00 GMTMulti-AgentMitigating Misalignment Contagion by Steering with Implicit Traitshttp://arxiv.org/abs/2605.02751v1http://arxiv.org/abs/2605.02751v1Maria Chang et al. — arxiv:2605.02751 — Multi-AgentMon, 04 May 2026 00:00:00 GMTMulti-AgentAI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Developmenthttp://arxiv.org/abs/2605.02741v1http://arxiv.org/abs/2605.02741v1Yuecai Zhu et al. — arxiv:2605.02741 — Multi-AgentMon, 04 May 2026 00:00:00 GMTMulti-AgentHybrid Inspection and Task-Based Access Control in Zero-Trust Agentic AIhttp://arxiv.org/abs/2605.02682v1http://arxiv.org/abs/2605.02682v1Majed El Helou et al. — arxiv:2605.02682 — Multi-AgentMon, 04 May 2026 00:00:00 GMTMulti-AgentAcademiClaw: When Students Set Challenges for AI Agentshttp://arxiv.org/abs/2605.02661v1http://arxiv.org/abs/2605.02661v1Junjie Yu et al. — arxiv:2605.02661 — Multi-AgentMon, 04 May 2026 00:00:00 GMTMulti-AgentBeyond State Machines: Executing Network Procedures with Agentic Tool-Calling Sequenceshttp://arxiv.org/abs/2605.02584v1http://arxiv.org/abs/2605.02584v1Purna Sai Garigipati et al. — arxiv:2605.02584 — Multi-AgentMon, 04 May 2026 00:00:00 GMTMulti-AgentIteRate: Autonomous AI Synthesis of In-Kernel eBPF Wi-Fi Rate Control Algorithmshttp://arxiv.org/abs/2605.02542v1http://arxiv.org/abs/2605.02542v1James Lynch et al. — arxiv:2605.02542 — Multi-AgentMon, 04 May 2026 00:00:00 GMTMulti-AgentFrom Experimental Limits to Physical Insight: A Retrieval-Augmented Multi-Agent Framework for Interpreting Searches Beyond the Standard Modelhttp://arxiv.org/abs/2605.02491v1http://arxiv.org/abs/2605.02491v1Altan Cakir et al. — arxiv:2605.02491 — Multi-AgentMon, 04 May 2026 00:00:00 GMTMulti-AgentBenchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Studyhttp://arxiv.org/abs/2605.02520v1http://arxiv.org/abs/2605.02520v1Devi Prasad Bal et al. — arxiv:2605.02520 — RAGMon, 04 May 2026 00:00:00 GMTRAGFight Poison with Poison: Enhancing Robustness in Few-shot Machine-Generated Text Detection with Adversarial Traininghttp://arxiv.org/abs/2605.02374v1http://arxiv.org/abs/2605.02374v1Wenjing Duan et al. — arxiv:2605.02374 — RAGMon, 04 May 2026 00:00:00 GMTRAGARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiringhttp://arxiv.org/abs/2605.02200v1http://arxiv.org/abs/2605.02200v1Deyi Ji et al. — arxiv:2605.02200 — RAGMon, 04 May 2026 00:00:00 GMTRAGDocSync: Agentic Documentation Maintenance via Critic-Guided Reflexionhttp://arxiv.org/abs/2605.02163v1http://arxiv.org/abs/2605.02163v1Sidhesh Badrinarayan et al. — arxiv:2605.02163 — RAGMon, 04 May 2026 00:00:00 GMTRAGSemantic Risk-Aware Heuristic Planning for Robotic Navigation in Dynamic Environments: An LLM-Inspired Approachhttp://arxiv.org/abs/2605.02862v1http://arxiv.org/abs/2605.02862v1Hamza Ahmed Durrani et al. — arxiv:2605.02862 — ReasoningMon, 04 May 2026 00:00:00 GMTReasoningBolek: A Multimodal Language Model for Molecular Reasoninghttp://arxiv.org/abs/2605.02745v1http://arxiv.org/abs/2605.02745v1Frederic Grabowski et al. — arxiv:2605.02745 — ReasoningMon, 04 May 2026 00:00:00 GMTReasoningVisual Latents Know More Than They Say: Unsilencing Latent Reasoning in MLLMshttp://arxiv.org/abs/2605.02735v1http://arxiv.org/abs/2605.02735v1Xin Zhang et al. — arxiv:2605.02735 — ReasoningMon, 04 May 2026 00:00:00 GMTReasoningAccurate Legal Reasoning at Scale: Neuro-Symbolic Offloading and Structural Auditability for Robust Legal Adjudicationhttp://arxiv.org/abs/2605.02472v1http://arxiv.org/abs/2605.02472v1Stanisław Sójka et al. — arxiv:2605.02472 — ReasoningMon, 04 May 2026 00:00:00 GMTReasoningPosition: How can Graphs Help Large Language Models?http://arxiv.org/abs/2605.02452v1http://arxiv.org/abs/2605.02452v1Xiyuan Wang et al. — arxiv:2605.02452 — ReasoningMon, 04 May 2026 00:00:00 GMTReasoningEnhancing Multimodal In-Context Learning via Inductive-Deductive Reasoninghttp://arxiv.org/abs/2605.02378v1http://arxiv.org/abs/2605.02378v1Haoyu Wang et al. — arxiv:2605.02378 — ReasoningMon, 04 May 2026 00:00:00 GMTReasoningSOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mindhttp://arxiv.org/abs/2605.02307v1http://arxiv.org/abs/2605.02307v1Yashwanth YS et al. — arxiv:2605.02307 — ReasoningMon, 04 May 2026 00:00:00 GMTReasoningDistilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decodinghttp://arxiv.org/abs/2605.02290v1http://arxiv.org/abs/2605.02290v1Taewon Yun et al. — arxiv:2605.02290 — ReasoningMon, 04 May 2026 00:00:00 GMTReasoningTowards Understanding Specification Gaming in Reasoning Modelshttp://arxiv.org/abs/2605.02269v1http://arxiv.org/abs/2605.02269v1Kei Nishimura-Gasparian et al. — arxiv:2605.02269 — ReasoningMon, 04 May 2026 00:00:00 GMTReasoningARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiringhttp://arxiv.org/abs/2605.02200v1http://arxiv.org/abs/2605.02200v1Deyi Ji et al. — arxiv:2605.02200 — ReasoningMon, 04 May 2026 00:00:00 GMTReasoningMolmoAct2: Action Reasoning Models for Real-world Deploymenthttp://arxiv.org/abs/2605.02881v1http://arxiv.org/abs/2605.02881v1Haoquan Fang et al. — arxiv:2605.02881 — ReasoningMon, 04 May 2026 00:00:00 GMTReasoningTool Use as Action: Towards Agentic Control in Mobile Core Networkshttp://arxiv.org/abs/2605.02811v1http://arxiv.org/abs/2605.02811v1Purna Sai Garigipati et al. — arxiv:2605.02811 — Tool UseMon, 04 May 2026 00:00:00 GMTTool UseReinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traceshttp://arxiv.org/abs/2605.02801v1http://arxiv.org/abs/2605.02801v1Chenchen Zhang et al. — arxiv:2605.02801 — Tool UseMon, 04 May 2026 00:00:00 GMTTool UseThe Design and Composition of Structural Causal Decision Processeshttp://arxiv.org/abs/2605.02681v1http://arxiv.org/abs/2605.02681v1Sebastian Benthall et al. — arxiv:2605.02681 — Tool UseMon, 04 May 2026 00:00:00 GMTTool UseHeavySkill: Heavy Thinking as the Inner Skill in Agentic Harnesshttp://arxiv.org/abs/2605.02396v1http://arxiv.org/abs/2605.02396v1Jianing Wang et al. — arxiv:2605.02396 — Tool UseMon, 04 May 2026 00:00:00 GMTTool UseHow to benchmark: the Measure-Explain-Test-Improve loophttp://arxiv.org/abs/2605.02233v1http://arxiv.org/abs/2605.02233v1Gabriel Scherer et al. — arxiv:2605.02233 — Tool UseMon, 04 May 2026 00:00:00 GMTTool UsePlanner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planninghttp://arxiv.org/abs/2605.02168v1http://arxiv.org/abs/2605.02168v1Wenyi Wu et al. — arxiv:2605.02168 — Tool UseMon, 04 May 2026 00:00:00 GMTTool UseFrom Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistryhttp://arxiv.org/abs/2605.03205v1http://arxiv.org/abs/2605.03205v1Aritra Roy et al. — arxiv:2605.03205 — Tool UseMon, 04 May 2026 00:00:00 GMTTool UseVideoNet: A Large-Scale Dataset for Domain-Specific Action Recognitionhttp://arxiv.org/abs/2605.02834v1http://arxiv.org/abs/2605.02834v1Tanush Yadav et al. — arxiv:2605.02834 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMLatent Bridge: Feature Delta Prediction for Efficient Dual-System Vision-Language-Action Model Inferencehttp://arxiv.org/abs/2605.02739v1http://arxiv.org/abs/2605.02739v1Yudong Liu et al. — arxiv:2605.02739 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMVisual Latents Know More Than They Say: Unsilencing Latent Reasoning in MLLMshttp://arxiv.org/abs/2605.02735v1http://arxiv.org/abs/2605.02735v1Xin Zhang et al. — arxiv:2605.02735 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMPerceptual Flow Network for Visually Grounded Reasoninghttp://arxiv.org/abs/2605.02730v1http://arxiv.org/abs/2605.02730v1Yangfu Li et al. — arxiv:2605.02730 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMPubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literaturehttp://arxiv.org/abs/2605.02720v1http://arxiv.org/abs/2605.02720v1Verena Jasmin Hallitschke et al. — arxiv:2605.02720 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMAutoFocus: Uncertainty-Aware Active Visual Search for GUI Groundinghttp://arxiv.org/abs/2605.02630v1http://arxiv.org/abs/2605.02630v1Ruilin Yao et al. — arxiv:2605.02630 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMRetrieving Any Relevant Moments: Benchmark and Models for Generalized Moment Retrievalhttp://arxiv.org/abs/2605.02623v1http://arxiv.org/abs/2605.02623v1Yiming Ding et al. — arxiv:2605.02623 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMRethinking the Need for Source Models: Source-Free Domain Adaptation from Scratch Guided by a Vision-Language Modelhttp://arxiv.org/abs/2605.02604v1http://arxiv.org/abs/2605.02604v1Zhou Bingtao et al. — arxiv:2605.02604 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMCoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulationhttp://arxiv.org/abs/2605.02600v1http://arxiv.org/abs/2605.02600v1Berk Çiçek et al. — arxiv:2605.02600 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMA Semantic Autonomy Framework for VLM-Integrated Indoor Mobile Robots: Hybrid Deterministic Reasoning and Cross-Robot Adaptive Memoryhttp://arxiv.org/abs/2605.02525v1http://arxiv.org/abs/2605.02525v1Bogdan Felician Abaza et al. — arxiv:2605.02525 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMAlbumFill: Album-Guided Reasoning and Retrieval for Personalized Image Completionhttp://arxiv.org/abs/2605.02892v1http://arxiv.org/abs/2605.02892v1Yu-Ju Tsai et al. — arxiv:2605.02892 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMMolmoAct2: Action Reasoning Models for Real-world Deploymenthttp://arxiv.org/abs/2605.02881v1http://arxiv.org/abs/2605.02881v1Haoquan Fang et al. — arxiv:2605.02881 — Multimodal LLMMon, 04 May 2026 00:00:00 GMTMultimodal LLMAutonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defensehttp://arxiv.org/abs/2605.02812v1http://arxiv.org/abs/2605.02812v1Mingming Zha et al. — arxiv:2605.02812 — Long ContextMon, 04 May 2026 00:00:00 GMTLong ContextTriple Spectral Fusion for Sensor-based Human Activity Recognitionhttp://arxiv.org/abs/2605.02743v1http://arxiv.org/abs/2605.02743v1Ye Zhang et al. — arxiv:2605.02743 — Long ContextMon, 04 May 2026 00:00:00 GMTLong ContextMSMixer: Learned Multi-Scale Temporal Mixing with Complementary Linear Shortcut for Long-Term Time Series Forecastinghttp://arxiv.org/abs/2605.02689v1http://arxiv.org/abs/2605.02689v1Ahmed Cherif et al. — arxiv:2605.02689 — Long ContextMon, 04 May 2026 00:00:00 GMTLong ContextThe 2026 ACII Dyadic Conversations (DaiKon) Workshop & Challengehttp://arxiv.org/abs/2605.02672v1http://arxiv.org/abs/2605.02672v1Panagiotis Tzirakis et al. — arxiv:2605.02672 — Long ContextMon, 04 May 2026 00:00:00 GMTLong ContextM\textsuperscript{4}Fuse: Lightweight State-Space MoE with a Cross-Scale Gating Bridge for Brain Tumor Segmentationhttp://arxiv.org/abs/2605.02444v1http://arxiv.org/abs/2605.02444v1Meihua Zhou et al. — arxiv:2605.02444 — Long ContextMon, 04 May 2026 00:00:00 GMTLong ContextThe Conversations Beneath the Code: Triadic Data for Long-Horizon Software Engineering Agentshttp://arxiv.org/abs/2605.02244v1http://arxiv.org/abs/2605.02244v1Yelin Kim et al. — arxiv:2605.02244 — Long ContextMon, 04 May 2026 00:00:00 GMTLong ContextRetrieval and Multi-Hop Reasoning in 1M-Token Context Windows: Evaluating LLMs on Classical Chinese Texthttp://arxiv.org/abs/2605.02173v1http://arxiv.org/abs/2605.02173v1Eric H. C. Chow et al. — arxiv:2605.02173 — Long ContextMon, 04 May 2026 00:00:00 GMTLong ContextStanding on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detectionhttp://arxiv.org/abs/2605.02860v1http://arxiv.org/abs/2605.02860v1Mohamad Khajezade et al. — arxiv:2605.02860 — LLM EfficiencyMon, 04 May 2026 00:00:00 GMTLLM EfficiencyTrust, but Verify: Peeling Low-Bit Transformer Networks for Training Monitoringhttp://arxiv.org/abs/2605.02853v1http://arxiv.org/abs/2605.02853v1Arian Eamaz et al. — arxiv:2605.02853 — LLM EfficiencyMon, 04 May 2026 00:00:00 GMTLLM EfficiencyCompress Then Adapt? No, Do It Together via Task-aware Union of Subspaceshttp://arxiv.org/abs/2605.02829v1http://arxiv.org/abs/2605.02829v1Jingze Ge et al. — arxiv:2605.02829 — LLM EfficiencyMon, 04 May 2026 00:00:00 GMTLLM EfficiencyWhen Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognitionhttp://arxiv.org/abs/2605.02782v1http://arxiv.org/abs/2605.02782v1Pehuén Moure et al. — arxiv:2605.02782 — LLM EfficiencyMon, 04 May 2026 00:00:00 GMTLLM EfficiencyProbing the Valley-Selective Tunneling Density of States in Monolayer MoS2 based Resonant Tunneling Deviceshttp://arxiv.org/abs/2605.02646v1http://arxiv.org/abs/2605.02646v1Abir Mukherjee et al. — arxiv:2605.02646 — LLM EfficiencyMon, 04 May 2026 00:00:00 GMTLLM EfficiencyVertMark: A Unified Training-Free Robust Watermarking Framework for Vertical Domain Pre-trained Language Modelshttp://arxiv.org/abs/2605.02557v1http://arxiv.org/abs/2605.02557v1Cong Kong et al. — arxiv:2605.02557 — LLM EfficiencyMon, 04 May 2026 00:00:00 GMTLLM EfficiencyReduced-Feedback Hybrid Precoding for Wideband mmWave MIMO-OFDM Systemshttp://arxiv.org/abs/2605.02418v1http://arxiv.org/abs/2605.02418v1Po-Heng Chou et al. — arxiv:2605.02418 — LLM EfficiencyMon, 04 May 2026 00:00:00 GMTLLM EfficiencyStatistically-Lossless Quantization of Large Language Modelshttp://arxiv.org/abs/2605.02404v1http://arxiv.org/abs/2605.02404v1Michael Helcig et al. — arxiv:2605.02404 — LLM EfficiencyMon, 04 May 2026 00:00:00 GMTLLM EfficiencyDescription and error analysis of quantum alghorithms in the projection evolution model -- the Deutsch algorithm casehttp://arxiv.org/abs/2605.02293v1http://arxiv.org/abs/2605.02293v1Krzysztof Lider et al. — arxiv:2605.02293 — LLM EfficiencyMon, 04 May 2026 00:00:00 GMTLLM EfficiencyEdgeLPR: On the Deep Neural Network trade-off between Precision and Performance in LiDAR Place Recognitionhttp://arxiv.org/abs/2605.02275v1http://arxiv.org/abs/2605.02275v1Pierpaolo Serio et al. — arxiv:2605.02275 — LLM EfficiencyMon, 04 May 2026 00:00:00 GMTLLM EfficiencyGradient-Gated DPO: Stabilizing Preference Optimization in Language Modelshttp://arxiv.org/abs/2605.02626v1http://arxiv.org/abs/2605.02626v1Inoussa Mouiche et al. — arxiv:2605.02626 — AlignmentMon, 04 May 2026 00:00:00 GMTAlignmentA Semantic Autonomy Framework for VLM-Integrated Indoor Mobile Robots: Hybrid Deterministic Reasoning and Cross-Robot Adaptive Memoryhttp://arxiv.org/abs/2605.02525v1http://arxiv.org/abs/2605.02525v1Bogdan Felician Abaza et al. — arxiv:2605.02525 — AlignmentMon, 04 May 2026 00:00:00 GMTAlignmentEfficient Preference Poisoning Attack on Offline RLHFhttp://arxiv.org/abs/2605.02495v1http://arxiv.org/abs/2605.02495v1Chenye Yang et al. — arxiv:2605.02495 — AlignmentMon, 04 May 2026 00:00:00 GMTAlignmentAnomaly-Preference Image Generationhttp://arxiv.org/abs/2605.02439v1http://arxiv.org/abs/2605.02439v1Fuyun Wang et al. — arxiv:2605.02439 — AlignmentMon, 04 May 2026 00:00:00 GMTAlignment"I Don't Have Faith in the Developers to Use My Feedback": Understanding Player Values and Expectancy for Reporting Systems in Video Gameshttp://arxiv.org/abs/2605.02842v1http://arxiv.org/abs/2605.02842v1Michael Yin et al. — arxiv:2605.02842 — HallucinationMon, 04 May 2026 00:00:00 GMTHallucinationThe classification of almost periodic flows on the hyperfinite type ${\rm II_1}$ factorhttp://arxiv.org/abs/2605.02781v1http://arxiv.org/abs/2605.02781v1Cyril Houdayer et al. — arxiv:2605.02781 — HallucinationMon, 04 May 2026 00:00:00 GMTHallucinationPerceptual Flow Network for Visually Grounded Reasoninghttp://arxiv.org/abs/2605.02730v1http://arxiv.org/abs/2605.02730v1Yangfu Li et al. — arxiv:2605.02730 — HallucinationMon, 04 May 2026 00:00:00 GMTHallucinationFoundation-Model-Based Agents in Industrial Automation: Purposes, Capabilities, and Open Challengeshttp://arxiv.org/abs/2605.02592v1http://arxiv.org/abs/2605.02592v1Vincent Henkel et al. — arxiv:2605.02592 — HallucinationMon, 04 May 2026 00:00:00 GMTHallucinationBenchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Studyhttp://arxiv.org/abs/2605.02520v1http://arxiv.org/abs/2605.02520v1Devi Prasad Bal et al. — arxiv:2605.02520 — HallucinationMon, 04 May 2026 00:00:00 GMTHallucinationA multilingual hallucination benchmark: MultiWikiQHalluAhttp://arxiv.org/abs/2605.02504v1http://arxiv.org/abs/2605.02504v1Freja Thoresen et al. — arxiv:2605.02504 — HallucinationMon, 04 May 2026 00:00:00 GMTHallucinationExpoCM: Exposure-Aware One-Step Generative Single-Image HDR Reconstructionhttp://arxiv.org/abs/2605.02464v1http://arxiv.org/abs/2605.02464v1Aoyu Liu et al. — arxiv:2605.02464 — HallucinationMon, 04 May 2026 00:00:00 GMTHallucinationPosition: How can Graphs Help Large Language Models?http://arxiv.org/abs/2605.02452v1http://arxiv.org/abs/2605.02452v1Xiyuan Wang et al. — arxiv:2605.02452 — HallucinationMon, 04 May 2026 00:00:00 GMTHallucinationHalluScan: A Systematic Benchmark for Detecting and Mitigating Hallucinations in Instruction-Following LLMshttp://arxiv.org/abs/2605.02443v1http://arxiv.org/abs/2605.02443v1Ahmed Cherif et al. — arxiv:2605.02443 — HallucinationMon, 04 May 2026 00:00:00 GMTHallucinationMeasuring AI Reasoning: A Guide for Researchershttp://arxiv.org/abs/2605.02442v1http://arxiv.org/abs/2605.02442v1Munachiso Samuel Nwadike et al. — arxiv:2605.02442 — HallucinationMon, 04 May 2026 00:00:00 GMTHallucinationContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priminghttp://arxiv.org/abs/2605.02647v1http://arxiv.org/abs/2605.02647v1Mario Rodríguez Béjar et al. — arxiv:2605.02647 — LLM SafetyMon, 04 May 2026 00:00:00 GMTLLM SafetySelf-Mined Hardness for Safety Fine-Tuninghttp://arxiv.org/abs/2605.03226v1http://arxiv.org/abs/2605.03226v1Prakhar Gupta et al. — arxiv:2605.03226 — LLM SafetyMon, 04 May 2026 00:00:00 GMTLLM SafetyRevisiting JBShield: Breaking and Rebuilding Representation-Level Jailbreak Defenseshttp://arxiv.org/abs/2605.03095v1http://arxiv.org/abs/2605.03095v1Kemal Derya et al. — arxiv:2605.03095 — LLM SafetyMon, 04 May 2026 00:00:00 GMTLLM SafetyNeuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablationhttp://arxiv.org/abs/2605.03058v1http://arxiv.org/abs/2605.03058v1Francesco Sovrano et al. — arxiv:2605.03058 — LLM SafetyMon, 04 May 2026 00:00:00 GMTLLM SafetyU-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planninghttp://arxiv.org/abs/2605.02765v1http://arxiv.org/abs/2605.02765v1Christine P Lee et al. — arxiv:2605.02765 — LLM EvaluationMon, 04 May 2026 00:00:00 GMTLLM EvaluationContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priminghttp://arxiv.org/abs/2605.02647v1http://arxiv.org/abs/2605.02647v1Mario Rodríguez Béjar et al. — arxiv:2605.02647 — LLM EvaluationMon, 04 May 2026 00:00:00 GMTLLM EvaluationWhen Stress Becomes Signal: Detecting Antifragility-Compatible Regimes in Multi-Agent LLM Systemshttp://arxiv.org/abs/2605.02463v1http://arxiv.org/abs/2605.02463v1Jose Manuel de la Chica et al. — arxiv:2605.02463 — LLM EvaluationMon, 04 May 2026 00:00:00 GMTLLM EvaluationDecoding-Time Debiasing via Process Reward Models: From Controlled Fill-in to Open-Ended Generationhttp://arxiv.org/abs/2605.02348v1http://arxiv.org/abs/2605.02348v1Muneeb Ur Raheem Khan et al. — arxiv:2605.02348 — LLM EvaluationMon, 04 May 2026 00:00:00 GMTLLM EvaluationDocSync: Agentic Documentation Maintenance via Critic-Guided Reflexionhttp://arxiv.org/abs/2605.02163v1http://arxiv.org/abs/2605.02163v1Sidhesh Badrinarayan et al. — arxiv:2605.02163 — LLM EvaluationMon, 04 May 2026 00:00:00 GMTLLM EvaluationTerminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?http://arxiv.org/abs/2605.03195v1http://arxiv.org/abs/2605.03195v1Spandan Garg et al. — arxiv:2605.03195 — LLM EvaluationMon, 04 May 2026 00:00:00 GMTLLM EvaluationPIIGuard: Mitigating PII Harvesting under Adversarial Sanitizationhttp://arxiv.org/abs/2605.03129v1http://arxiv.org/abs/2605.03129v1Mingshuo Liu et al. — arxiv:2605.03129 — LLM EvaluationMon, 04 May 2026 00:00:00 GMTLLM EvaluationEvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphshttp://arxiv.org/abs/2605.02868v1http://arxiv.org/abs/2605.02868v1Ruichao Liang et al. — arxiv:2605.02868 — Code LLMMon, 04 May 2026 00:00:00 GMTCode LLMAI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Developmenthttp://arxiv.org/abs/2605.02741v1http://arxiv.org/abs/2605.02741v1Yuecai Zhu et al. — arxiv:2605.02741 — Code LLMMon, 04 May 2026 00:00:00 GMTCode LLMLLM-Assisted Repository-Level Generation with Structured Spec-Driven Engineeringhttp://arxiv.org/abs/2605.02455v1http://arxiv.org/abs/2605.02455v1Shuzhao Feng et al. — arxiv:2605.02455 — Code LLMMon, 04 May 2026 00:00:00 GMTCode LLMARIADNE: Agentic Reward-Informed Adaptive Decision Exploration via Blackboard-Driven MCTS for Competitive Program Generationhttp://arxiv.org/abs/2605.02431v1http://arxiv.org/abs/2605.02431v1Minnan Wei et al. — arxiv:2605.02431 — Code LLMMon, 04 May 2026 00:00:00 GMTCode LLMMolViBench: Evaluating LLMs on Molecular Vibe Codinghttp://arxiv.org/abs/2605.02351v1http://arxiv.org/abs/2605.02351v1Jiatong Li et al. — arxiv:2605.02351 — Code LLMMon, 04 May 2026 00:00:00 GMTCode LLMEngiAgent: Fully Connected Coordination of LLM Agents for Solving Open-ended Engineering Problems with Feasible Solutionshttp://arxiv.org/abs/2605.02289v1http://arxiv.org/abs/2605.02289v1Xiyuan Zhou et al. — arxiv:2605.02289 — Code LLMMon, 04 May 2026 00:00:00 GMTCode LLMExact Higher-Order Derivatives for SE(3) via Analytical/AD Methodshttp://arxiv.org/abs/2605.02252v1http://arxiv.org/abs/2605.02252v1Frank O. Kuehnel et al. — arxiv:2605.02252 — Code LLMMon, 04 May 2026 00:00:00 GMTCode LLMA Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Promptshttp://arxiv.org/abs/2605.03179v1http://arxiv.org/abs/2605.03179v1Richard J. Young et al. — arxiv:2605.03179 — Code LLMMon, 04 May 2026 00:00:00 GMTCode LLMLearning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agentshttp://arxiv.org/abs/2605.03159v1http://arxiv.org/abs/2605.03159v1Reshabh K Sharma et al. — arxiv:2605.03159 — Code LLMMon, 04 May 2026 00:00:00 GMTCode LLMAccurate Legal Reasoning at Scale: Neuro-Symbolic Offloading and Structural Auditability for Robust Legal Adjudicationhttp://arxiv.org/abs/2605.02472v1http://arxiv.org/abs/2605.02472v1Stanisław Sójka et al. — arxiv:2605.02472 — Legal NLPMon, 04 May 2026 00:00:00 GMTLegal NLPStructural Dilemmas and Developmental Pathways of Legal Argument Mining in the Era of Artificial Intelligencehttp://arxiv.org/abs/2605.02308v1http://arxiv.org/abs/2605.02308v1Xianglei Liao et al. — arxiv:2605.02308 — Legal NLPMon, 04 May 2026 00:00:00 GMTLegal NLPDependency Parsing Across the Resource Spectrum: Evaluating Architectures on High and Low-Resource Languageshttp://arxiv.org/abs/2605.02608v1http://arxiv.org/abs/2605.02608v1Kevin Guan et al. — arxiv:2605.02608 — Multilingual NLPMon, 04 May 2026 00:00:00 GMTMultilingual NLPSemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultureshttp://arxiv.org/abs/2605.02601v1http://arxiv.org/abs/2605.02601v1Nedjma Ousidhoum et al. — arxiv:2605.02601 — Multilingual NLPMon, 04 May 2026 00:00:00 GMTMultilingual NLPTibetan-TTS:Low-Resource Tibetan Speech Synthesis with Large Model Adaptationhttp://arxiv.org/abs/2605.02496v1http://arxiv.org/abs/2605.02496v1Jiaxu He et al. — arxiv:2605.02496 — Multilingual NLPMon, 04 May 2026 00:00:00 GMTMultilingual NLPReliability-Oriented Multilingual Orthopedic Diagnosis: A Domain-Adaptive Modeling and a Conceptual Validation Frameworkhttp://arxiv.org/abs/2605.02266v1http://arxiv.org/abs/2605.02266v1Danish Ali et al. — arxiv:2605.02266 — Multilingual NLPMon, 04 May 2026 00:00:00 GMTMultilingual NLPAdaptive Gait Generation for Multi-Terrain Exoskeletons via Constrained Kernelized Movement Primitiveshttp://arxiv.org/abs/2605.02513v1http://arxiv.org/abs/2605.02513v1Edoardo Trombin et al. — arxiv:2605.02513 — Information ExtractionMon, 04 May 2026 00:00:00 GMTInformation ExtractionSCRIBE: Practical Static Binary Patching via Binary-Aware Recompilation of Decompiled Codehttp://arxiv.org/abs/2605.02121v1http://arxiv.org/abs/2605.02121v1Han Dai et al. — arxiv:2605.02121 — Information ExtractionMon, 04 May 2026 00:00:00 GMTInformation ExtractionMedStruct-S: A Benchmark for Key Discovery, Key-Conditioned QA and Semi-Structured Extraction from OCR Clinical Reportshttp://arxiv.org/abs/2605.03103v1http://arxiv.org/abs/2605.03103v1Yingyun Li et al. — arxiv:2605.03103 — Information ExtractionMon, 04 May 2026 00:00:00 GMTInformation Extractionmdok-style at SemEval-2026 Task 10: Finetuning LLMs for Conspiracy Detectionhttp://arxiv.org/abs/2605.02712v1http://arxiv.org/abs/2605.02712v1Dominik Macko et al. — arxiv:2605.02712 — Text ClassificationMon, 04 May 2026 00:00:00 GMTText ClassificationVideoNet: A Large-Scale Dataset for Domain-Specific Action Recognitionhttp://arxiv.org/abs/2605.02834v1http://arxiv.org/abs/2605.02834v1Tanush Yadav et al. — arxiv:2605.02834 — Question AnsweringMon, 04 May 2026 00:00:00 GMTQuestion AnsweringCompress Then Adapt? No, Do It Together via Task-aware Union of Subspaceshttp://arxiv.org/abs/2605.02829v1http://arxiv.org/abs/2605.02829v1Jingze Ge et al. — arxiv:2605.02829 — Question AnsweringMon, 04 May 2026 00:00:00 GMTQuestion AnsweringSCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answeringhttp://arxiv.org/abs/2605.02819v1http://arxiv.org/abs/2605.02819v1Jiujiu Chen et al. — arxiv:2605.02819 — Question AnsweringMon, 04 May 2026 00:00:00 GMTQuestion AnsweringBenchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Studyhttp://arxiv.org/abs/2605.02520v1http://arxiv.org/abs/2605.02520v1Devi Prasad Bal et al. — arxiv:2605.02520 — Question AnsweringMon, 04 May 2026 00:00:00 GMTQuestion AnsweringSkillCom: Decomposing LLM-based Semantic Communication into Task and Channel Aware Skillshttp://arxiv.org/abs/2605.02333v1http://arxiv.org/abs/2605.02333v1Jingwen Fu et al. — arxiv:2605.02333 — Question AnsweringMon, 04 May 2026 00:00:00 GMTQuestion AnsweringCBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Modelshttp://arxiv.org/abs/2605.02202v1http://arxiv.org/abs/2605.02202v1Ji Guo et al. — arxiv:2605.02202 — Question AnsweringMon, 04 May 2026 00:00:00 GMTQuestion AnsweringMEMAUDIT: An Exact Package-Oracle Evaluation Protocol for Budgeted Long-Term LLM Memory Writinghttp://arxiv.org/abs/2605.02199v1http://arxiv.org/abs/2605.02199v1Nishant Bhargava et al. — arxiv:2605.02199 — Question AnsweringMon, 04 May 2026 00:00:00 GMTQuestion AnsweringT$^2$PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learninghttp://arxiv.org/abs/2605.02178v1http://arxiv.org/abs/2605.02178v1Haixin Wang et al. — arxiv:2605.02178 — Question AnsweringMon, 04 May 2026 00:00:00 GMTQuestion AnsweringCLaC at SemEval-2026 Task 6: Response Clarity Detection in Political Discoursehttp://arxiv.org/abs/2605.02170v1http://arxiv.org/abs/2605.02170v1Nawar Turk et al. — arxiv:2605.02170 — Question AnsweringMon, 04 May 2026 00:00:00 GMTQuestion AnsweringSemantically Enriching Investor Micro-blogs for Opinion-Aware Emotion Analysis: A Practical Approachhttp://arxiv.org/abs/2605.03092v1http://arxiv.org/abs/2605.03092v1Gaurav Negi et al. — arxiv:2605.03092 — Sentiment AnalysisMon, 04 May 2026 00:00:00 GMTSentiment AnalysisEvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphshttp://arxiv.org/abs/2605.02868v1http://arxiv.org/abs/2605.02868v1Ruichao Liang et al. — arxiv:2605.02868 — Knowledge GraphMon, 04 May 2026 00:00:00 GMTKnowledge GraphSCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answeringhttp://arxiv.org/abs/2605.02819v1http://arxiv.org/abs/2605.02819v1Jiujiu Chen et al. — arxiv:2605.02819 — Knowledge GraphMon, 04 May 2026 00:00:00 GMTKnowledge GraphFine-Grained Graph Generation through Latent Mixture Schedulinghttp://arxiv.org/abs/2605.02780v1http://arxiv.org/abs/2605.02780v1Nidhi Vakil et al. — arxiv:2605.02780 — Knowledge GraphMon, 04 May 2026 00:00:00 GMTKnowledge GraphPosition: How can Graphs Help Large Language Models?http://arxiv.org/abs/2605.02452v1http://arxiv.org/abs/2605.02452v1Xiyuan Wang et al. — arxiv:2605.02452 — Knowledge GraphMon, 04 May 2026 00:00:00 GMTKnowledge GraphEditPropBench: Measuring Factual Edit Propagation in Scientific Manuscriptshttp://arxiv.org/abs/2605.02083v1http://arxiv.org/abs/2605.02083v1Garvin Kruthof et al. — arxiv:2605.02083 — NLPSun, 03 May 2026 00:00:00 GMTNLPMaistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Modelshttp://arxiv.org/abs/2605.01870v1http://arxiv.org/abs/2605.01870v1Nikolaos Giarelis et al. — arxiv:2605.01870 — NLPSun, 03 May 2026 00:00:00 GMTNLPThe Cylindrical Representation Hypothesis for Language Model Steeringhttp://arxiv.org/abs/2605.01844v1http://arxiv.org/abs/2605.01844v1Lang Gao et al. — arxiv:2605.01844 — NLPSun, 03 May 2026 00:00:00 GMTNLPEnhancing Judgment Document Generation via Agentic Legal Information Collection and Rubric-Guided Optimizationhttp://arxiv.org/abs/2605.02011v1http://arxiv.org/abs/2605.02011v1Weihang Su et al. — arxiv:2605.02011 — RAGSun, 03 May 2026 00:00:00 GMTRAGTrojan Hippo: Weaponizing Agent Memory for Data Exfiltrationhttp://arxiv.org/abs/2605.01970v1http://arxiv.org/abs/2605.01970v1Debeshee Das et al. — arxiv:2605.01970 — RAGSun, 03 May 2026 00:00:00 GMTRAGNeedle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidencehttp://arxiv.org/abs/2605.01782v1http://arxiv.org/abs/2605.01782v1Huining Cui et al. — arxiv:2605.01782 — RAGSun, 03 May 2026 00:00:00 GMTRAGTrajRAG: Retrieving Geometric-Semantic Experience for Zero-Shot Object Navigationhttp://arxiv.org/abs/2605.01700v1http://arxiv.org/abs/2605.01700v1Yiyao Wang et al. — arxiv:2605.01700 — RAGSun, 03 May 2026 00:00:00 GMTRAGA Hybrid Retrieval and Reranking Framework for Evidence-Grounded Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.01664v1http://arxiv.org/abs/2605.01664v1Fariba Afrin Irany et al. — arxiv:2605.01664 — RAGSun, 03 May 2026 00:00:00 GMTRAGTrojan Hippo: Weaponizing Agent Memory for Data Exfiltrationhttp://arxiv.org/abs/2605.01970v1http://arxiv.org/abs/2605.01970v1Debeshee Das et al. — arxiv:2605.01970 — Long ContextSun, 03 May 2026 00:00:00 GMTLong ContextStochastic Sparse Attention for Memory-Bound Inferencehttp://arxiv.org/abs/2605.01910v1http://arxiv.org/abs/2605.01910v1Kyle Lee et al. — arxiv:2605.01910 — Long ContextSun, 03 May 2026 00:00:00 GMTLong ContextLong Sync Word Frame Synchronization for Future Wireless Networkshttp://arxiv.org/abs/2605.01890v1http://arxiv.org/abs/2605.01890v1Dimitris Nikolaidis et al. — arxiv:2605.01890 — Long ContextSun, 03 May 2026 00:00:00 GMTLong Context12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberationhttp://arxiv.org/abs/2605.01986v1http://arxiv.org/abs/2605.01986v1Ahmet Bahaddin Ersoz et al. — arxiv:2605.01986 — AlignmentSun, 03 May 2026 00:00:00 GMTAlignmentThe Compliance Gap: Why AI Systems Promise to Follow Process Instructions but Don'thttp://arxiv.org/abs/2605.01771v1http://arxiv.org/abs/2605.01771v1Kwan Soo Shin et al. — arxiv:2605.01771 — AlignmentSun, 03 May 2026 00:00:00 GMTAlignmentBeyond Perplexity: Character Distribution Signatures and the MDTA Benchmark for AI Text Detectionhttp://arxiv.org/abs/2605.01647v1http://arxiv.org/abs/2605.01647v1Priyadarshan Narayanasamy et al. — arxiv:2605.01647 — AlignmentSun, 03 May 2026 00:00:00 GMTAlignmentTrojan Hippo: Weaponizing Agent Memory for Data Exfiltrationhttp://arxiv.org/abs/2605.01970v1http://arxiv.org/abs/2605.01970v1Debeshee Das et al. — arxiv:2605.01970 — LLM SafetySun, 03 May 2026 00:00:00 GMTLLM SafetyDisentangling Intent from Role: Adversarial Self-Play for Persona-Invariant Safety Alignmenthttp://arxiv.org/abs/2605.01899v1http://arxiv.org/abs/2605.01899v1Jiajia Li et al. — arxiv:2605.01899 — LLM SafetySun, 03 May 2026 00:00:00 GMTLLM SafetyTrajShield: Trajectory-Level Safety Mediation for Defending Text-to-Video Models Against Jailbreak Attackshttp://arxiv.org/abs/2605.01761v1http://arxiv.org/abs/2605.01761v1Quanchen Zou et al. — arxiv:2605.01761 — LLM SafetySun, 03 May 2026 00:00:00 GMTLLM SafetyCatching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systemshttp://arxiv.org/abs/2605.01758v1http://arxiv.org/abs/2605.01758v1Yue Ma et al. — arxiv:2605.01758 — LLM SafetySun, 03 May 2026 00:00:00 GMTLLM SafetyMultiBreak: A Scalable and Diverse Multi-turn Jailbreak Benchmark for Evaluating LLM Safetyhttp://arxiv.org/abs/2605.01687v1http://arxiv.org/abs/2605.01687v1Jialin Song et al. — arxiv:2605.01687 — LLM SafetySun, 03 May 2026 00:00:00 GMTLLM SafetyA Multimodal Dataset for Visually Grounded Ambiguity in Machine Translationhttp://arxiv.org/abs/2605.02035v1http://arxiv.org/abs/2605.02035v1Jingheng Pan et al. — arxiv:2605.02035 — LLM EvaluationSun, 03 May 2026 00:00:00 GMTLLM EvaluationEnhancing Judgment Document Generation via Agentic Legal Information Collection and Rubric-Guided Optimizationhttp://arxiv.org/abs/2605.02011v1http://arxiv.org/abs/2605.02011v1Weihang Su et al. — arxiv:2605.02011 — LLM EvaluationSun, 03 May 2026 00:00:00 GMTLLM Evaluation12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberationhttp://arxiv.org/abs/2605.01986v1http://arxiv.org/abs/2605.01986v1Ahmet Bahaddin Ersoz et al. — arxiv:2605.01986 — LLM EvaluationSun, 03 May 2026 00:00:00 GMTLLM EvaluationSurgCheck: Do Vision-Language Models Really Look at Images in Surgical VQA?http://arxiv.org/abs/2605.01911v1http://arxiv.org/abs/2605.01911v1Jongmin Shin et al. — arxiv:2605.01911 — LLM EvaluationSun, 03 May 2026 00:00:00 GMTLLM EvaluationMaistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Modelshttp://arxiv.org/abs/2605.01870v1http://arxiv.org/abs/2605.01870v1Nikolaos Giarelis et al. — arxiv:2605.01870 — LLM EvaluationSun, 03 May 2026 00:00:00 GMTLLM EvaluationVulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patternshttp://arxiv.org/abs/2605.01769v1http://arxiv.org/abs/2605.01769v1Jia Li et al. — arxiv:2605.01769 — Code LLMSun, 03 May 2026 00:00:00 GMTCode LLMA Hybrid Retrieval and Reranking Framework for Evidence-Grounded Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.01664v1http://arxiv.org/abs/2605.01664v1Fariba Afrin Irany et al. — arxiv:2605.01664 — Medical NLPSun, 03 May 2026 00:00:00 GMTMedical NLPMultilingual Safety Alignment via Self-Distillationhttp://arxiv.org/abs/2605.02971v1http://arxiv.org/abs/2605.02971v1Ruiyang Qin et al. — arxiv:2605.02971 — Multilingual NLPSun, 03 May 2026 00:00:00 GMTMultilingual NLPTIJERE: A Novel Threat Intelligence Joint Extraction Model Based on Analyst Expert Knowledgehttp://arxiv.org/abs/2605.02041v1http://arxiv.org/abs/2605.02041v1Inoussa Mouiche et al. — arxiv:2605.02041 — Named Entity RecognitionSun, 03 May 2026 00:00:00 GMTNamed Entity RecognitionBIM Information Extraction Through LLM-based Adaptive Explorationhttp://arxiv.org/abs/2605.01698v1http://arxiv.org/abs/2605.01698v1Sylvain Hellin et al. — arxiv:2605.01698 — Information ExtractionSun, 03 May 2026 00:00:00 GMTInformation ExtractionFlexi-LoRA with Input-Adaptive Ranks: Efficient Finetuning for Speech and Reasoning Taskshttp://arxiv.org/abs/2605.01959v1http://arxiv.org/abs/2605.01959v1Zongqian Li et al. — arxiv:2605.01959 — Question AnsweringSun, 03 May 2026 00:00:00 GMTQuestion AnsweringTIJERE: A Novel Threat Intelligence Joint Extraction Model Based on Analyst Expert Knowledgehttp://arxiv.org/abs/2605.02041v1http://arxiv.org/abs/2605.02041v1Inoussa Mouiche et al. — arxiv:2605.02041 — Knowledge GraphSun, 03 May 2026 00:00:00 GMTKnowledge GraphLed to Mislead: Adversarial Content Injection for Attacks on Neural Ranking Modelshttp://arxiv.org/abs/2605.01591v1http://arxiv.org/abs/2605.01591v1Amin Bigdeli et al. — arxiv:2605.01591 — RAGSat, 02 May 2026 00:00:00 GMTRAGStrong light-matter interactions in hybrid polaritonic systemshttp://arxiv.org/abs/2605.01583v1http://arxiv.org/abs/2605.01583v1Ben Johns et al. — arxiv:2605.01583 — Tool UseSat, 02 May 2026 00:00:00 GMTTool UseEO-Gym: A Multimodal, Interactive Environment for Earth Observation Agentshttp://arxiv.org/abs/2605.01250v1http://arxiv.org/abs/2605.01250v1Sai Ma et al. — arxiv:2605.01250 — Tool UseSat, 02 May 2026 00:00:00 GMTTool UseS^3-R1: Learning to Retrieve and Answer Step-by-Step with Synthetic Datahttp://arxiv.org/abs/2605.01248v1http://arxiv.org/abs/2605.01248v1Harsh Goel et al. — arxiv:2605.01248 — Tool UseSat, 02 May 2026 00:00:00 GMTTool UseA Theory of Generalization in Deep Learninghttp://arxiv.org/abs/2605.01172v1http://arxiv.org/abs/2605.01172v1Elon Litman et al. — arxiv:2605.01172 — AlignmentSat, 02 May 2026 00:00:00 GMTAlignmentVisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Modelshttp://arxiv.org/abs/2605.01449v1http://arxiv.org/abs/2605.01449v1Pang Liu et al. — arxiv:2605.01449 — LLM SafetySat, 02 May 2026 00:00:00 GMTLLM SafetyAsymmetric Invertible Threat: Learning Reversible Privacy Defense for Face Recognitionhttp://arxiv.org/abs/2605.01217v1http://arxiv.org/abs/2605.01217v1Jiabei Zhang et al. — arxiv:2605.01217 — LLM SafetySat, 02 May 2026 00:00:00 GMTLLM SafetyUsing LLMs in Software Design: An Empirical Study of GitHub and A Practitioner Surveyhttp://arxiv.org/abs/2605.01392v1http://arxiv.org/abs/2605.01392v1Yifei Wang et al. — arxiv:2605.01392 — Code LLMSat, 02 May 2026 00:00:00 GMTCode LLMMAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debatehttp://arxiv.org/abs/2605.01347v1http://arxiv.org/abs/2605.01347v1Jianze Wang et al. — arxiv:2605.01347 — Code LLMSat, 02 May 2026 00:00:00 GMTCode LLMToward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AIhttp://arxiv.org/abs/2605.01597v1http://arxiv.org/abs/2605.01597v1Yi-Cheng Lin et al. — arxiv:2605.01597 — Speech LLMSat, 02 May 2026 00:00:00 GMTSpeech LLMConcepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representationshttp://arxiv.org/abs/2605.01609v1http://arxiv.org/abs/2605.01609v1Pratyush Acharya et al. — arxiv:2605.01609 — Multilingual NLPSat, 02 May 2026 00:00:00 GMTMultilingual NLPAuditing demographic bias in AI-based emergency police dispatch: a cross-lingual evaluation of eleven large language modelshttp://arxiv.org/abs/2605.01451v1http://arxiv.org/abs/2605.01451v1William Guey et al. — arxiv:2605.01451 — Multilingual NLPSat, 02 May 2026 00:00:00 GMTMultilingual NLPLost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMshttp://arxiv.org/abs/2605.01224v1http://arxiv.org/abs/2605.01224v1Anjishnu Mukherjee et al. — arxiv:2605.01224 — Multilingual NLPSat, 02 May 2026 00:00:00 GMTMultilingual NLPMedmarks: A Comprehensive Open-Source LLM Benchmark Suite for Medical Taskshttp://arxiv.org/abs/2605.01417v1http://arxiv.org/abs/2605.01417v1Benjamin Warner et al. — arxiv:2605.01417 — Information ExtractionSat, 02 May 2026 00:00:00 GMTInformation ExtractionAddressing Data Scarcity in Bangla Fake News Detection: An LLM-Based Dataset Augmentation Approachhttp://arxiv.org/abs/2605.01292v1http://arxiv.org/abs/2605.01292v1Ahmed Alfey Sani et al. — arxiv:2605.01292 — Text ClassificationSat, 02 May 2026 00:00:00 GMTText ClassificationArbitrarily Conditioned Hierarchical Flows for Spatiotemporal Eventshttp://arxiv.org/abs/2605.01226v1http://arxiv.org/abs/2605.01226v1Keyan Chen et al. — arxiv:2605.01226 — Text ClassificationSat, 02 May 2026 00:00:00 GMTText ClassificationBenchmarking LightGBM and BiLSTM for Sentiment Analysis on Indonesian E-Commerce Reviewshttp://arxiv.org/abs/2605.01322v1http://arxiv.org/abs/2605.01322v1Lidia Natasyah Marpaung et al. — arxiv:2605.01322 — Sentiment AnalysisSat, 02 May 2026 00:00:00 GMTSentiment AnalysisSentiment Analysis of Mobile Legends App Reviews Using Machine Learning and LSTM-Based Deep Learning Modelshttp://arxiv.org/abs/2605.01317v1http://arxiv.org/abs/2605.01317v1Vira Putri Maharani et al. — arxiv:2605.01317 — Sentiment AnalysisSat, 02 May 2026 00:00:00 GMTSentiment AnalysisEnhancing Game Review Sentiment Classification on Steam Platform with Attention-Based BiLSTMhttp://arxiv.org/abs/2605.01315v1http://arxiv.org/abs/2605.01315v1Abit Ahmad Oktarian et al. — arxiv:2605.01315 — Sentiment AnalysisSat, 02 May 2026 00:00:00 GMTSentiment AnalysisKG-First, LLM-Fallback: A Hybrid Microservice for Grounded Skill Search and Explanationhttp://arxiv.org/abs/2605.01582v1http://arxiv.org/abs/2605.01582v1Ngoc Luyen Le et al. — arxiv:2605.01582 — Knowledge GraphSat, 02 May 2026 00:00:00 GMTKnowledge GraphActionable Understanding: Action Units for Bridging the Knowledge-Action Gap in Post-FAIR Knowledge Infrastructureshttp://arxiv.org/abs/2605.01564v1http://arxiv.org/abs/2605.01564v1Lars Vogt et al. — arxiv:2605.01564 — Knowledge GraphSat, 02 May 2026 00:00:00 GMTKnowledge GraphSciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoninghttp://arxiv.org/abs/2605.01489v1http://arxiv.org/abs/2605.01489v1Tianshi Zheng et al. — arxiv:2605.01489 — Knowledge GraphSat, 02 May 2026 00:00:00 GMTKnowledge GraphDirected Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Mediahttp://arxiv.org/abs/2605.00776v1http://arxiv.org/abs/2605.00776v1Scott Friedman et al. — arxiv:2605.00776 — NLPFri, 01 May 2026 00:00:00 GMTNLPBWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMshttp://arxiv.org/abs/2605.00422v1http://arxiv.org/abs/2605.00422v1Zhixiong Zhao et al. — arxiv:2605.00422 — NLPFri, 01 May 2026 00:00:00 GMTNLPWhen LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Modelshttp://arxiv.org/abs/2605.00817v1http://arxiv.org/abs/2605.00817v1Sailesh Panda et al. — arxiv:2605.00817 — LLMFri, 01 May 2026 00:00:00 GMTLLMLet ViT Speak: Generative Language-Image Pre-traininghttp://arxiv.org/abs/2605.00809v1http://arxiv.org/abs/2605.00809v1Yan Fang et al. — arxiv:2605.00809 — LLMFri, 01 May 2026 00:00:00 GMTLLMCan Coding Agents Reproduce Findings in Computational Materials Science?http://arxiv.org/abs/2605.00803v1http://arxiv.org/abs/2605.00803v1Ziyang Huang et al. — arxiv:2605.00803 — LLMFri, 01 May 2026 00:00:00 GMTLLMGenerating Statistical Charts with Validation-Driven LLM Workflowshttp://arxiv.org/abs/2605.00800v1http://arxiv.org/abs/2605.00800v1Pavlin G. Poličar et al. — arxiv:2605.00800 — LLMFri, 01 May 2026 00:00:00 GMTLLMRunAgent: Interpreting Natural-Language Plans with Constraint-Guided Executionhttp://arxiv.org/abs/2605.00798v1http://arxiv.org/abs/2605.00798v1Arunabh Srivastava et al. — arxiv:2605.00798 — LLMFri, 01 May 2026 00:00:00 GMTLLMWhen RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AIhttp://arxiv.org/abs/2605.00796v1http://arxiv.org/abs/2605.00796v1Alfredo Madrid-García et al. — arxiv:2605.00796 — LLMFri, 01 May 2026 00:00:00 GMTLLMMake Your LVLM KV Cache More Lightweighthttp://arxiv.org/abs/2605.00789v1http://arxiv.org/abs/2605.00789v1Xihao Chen et al. — arxiv:2605.00789 — LLMFri, 01 May 2026 00:00:00 GMTLLMGeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repairhttp://arxiv.org/abs/2605.00782v1http://arxiv.org/abs/2605.00782v1Yinhao Xiao et al. — arxiv:2605.00782 — LLMFri, 01 May 2026 00:00:00 GMTLLMPosition: agentic AI orchestration should be Bayes-consistenthttp://arxiv.org/abs/2605.00742v1http://arxiv.org/abs/2605.00742v1Theodore Papamarkou et al. — arxiv:2605.00742 — LLMFri, 01 May 2026 00:00:00 GMTLLMSelf-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systemshttp://arxiv.org/abs/2605.00741v1http://arxiv.org/abs/2605.00741v1Saeid Jamshidi et al. — arxiv:2605.00741 — LLMFri, 01 May 2026 00:00:00 GMTLLMCan Coding Agents Reproduce Findings in Computational Materials Science?http://arxiv.org/abs/2605.00803v1http://arxiv.org/abs/2605.00803v1Ziyang Huang et al. — arxiv:2605.00803 — LLM AgentFri, 01 May 2026 00:00:00 GMTLLM AgentRunAgent: Interpreting Natural-Language Plans with Constraint-Guided Executionhttp://arxiv.org/abs/2605.00798v1http://arxiv.org/abs/2605.00798v1Arunabh Srivastava et al. — arxiv:2605.00798 — LLM AgentFri, 01 May 2026 00:00:00 GMTLLM AgentSimpson's paradox explains the ubiquity of nonlinear, threshold, and complex contagionshttp://arxiv.org/abs/2605.00791v1http://arxiv.org/abs/2605.00791v1Laurent Hébert-Dufresne et al. — arxiv:2605.00791 — LLM AgentFri, 01 May 2026 00:00:00 GMTLLM AgentPenalized Likelihood for Dyadic Network Formation Models with Degree Heterogeneityhttp://arxiv.org/abs/2605.00771v1http://arxiv.org/abs/2605.00771v1Zizhong Yan et al. — arxiv:2605.00771 — LLM AgentFri, 01 May 2026 00:00:00 GMTLLM AgentMeritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Valueshttp://arxiv.org/abs/2605.00762v1http://arxiv.org/abs/2605.00762v1Shradha Sharma et al. — arxiv:2605.00762 — LLM AgentFri, 01 May 2026 00:00:00 GMTLLM AgentNonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Searchhttp://arxiv.org/abs/2605.00751v1http://arxiv.org/abs/2605.00751v1Sizhe Tang et al. — arxiv:2605.00751 — LLM AgentFri, 01 May 2026 00:00:00 GMTLLM AgentPosition: agentic AI orchestration should be Bayes-consistenthttp://arxiv.org/abs/2605.00742v1http://arxiv.org/abs/2605.00742v1Theodore Papamarkou et al. — arxiv:2605.00742 — LLM AgentFri, 01 May 2026 00:00:00 GMTLLM AgentSelf-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systemshttp://arxiv.org/abs/2605.00741v1http://arxiv.org/abs/2605.00741v1Saeid Jamshidi et al. — arxiv:2605.00741 — LLM AgentFri, 01 May 2026 00:00:00 GMTLLM AgentTo Call or Not to Call: A Framework to Assess and Optimize LLM Tool Callinghttp://arxiv.org/abs/2605.00737v1http://arxiv.org/abs/2605.00737v1Qinyuan Wu et al. — arxiv:2605.00737 — LLM AgentFri, 01 May 2026 00:00:00 GMTLLM AgentDecentralized Proximal Stochastic Gradient Langevin Dynamicshttp://arxiv.org/abs/2605.00723v1http://arxiv.org/abs/2605.00723v1Mohammad Rafiqul Islam et al. — arxiv:2605.00723 — LLM AgentFri, 01 May 2026 00:00:00 GMTLLM AgentRunAgent: Interpreting Natural-Language Plans with Constraint-Guided Executionhttp://arxiv.org/abs/2605.00798v1http://arxiv.org/abs/2605.00798v1Arunabh Srivastava et al. — arxiv:2605.00798 — Multi-AgentFri, 01 May 2026 00:00:00 GMTMulti-AgentMeritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Valueshttp://arxiv.org/abs/2605.00762v1http://arxiv.org/abs/2605.00762v1Shradha Sharma et al. — arxiv:2605.00762 — Multi-AgentFri, 01 May 2026 00:00:00 GMTMulti-AgentNonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Searchhttp://arxiv.org/abs/2605.00751v1http://arxiv.org/abs/2605.00751v1Sizhe Tang et al. — arxiv:2605.00751 — Multi-AgentFri, 01 May 2026 00:00:00 GMTMulti-AgentSelf-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systemshttp://arxiv.org/abs/2605.00741v1http://arxiv.org/abs/2605.00741v1Saeid Jamshidi et al. — arxiv:2605.00741 — Multi-AgentFri, 01 May 2026 00:00:00 GMTMulti-AgentLearning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memoryhttp://arxiv.org/abs/2605.00702v1http://arxiv.org/abs/2605.00702v1Derong Xu et al. — arxiv:2605.00702 — Multi-AgentFri, 01 May 2026 00:00:00 GMTMulti-AgentLearning to Act and Cooperate for Distributed Black-Box Consensus Optimizationhttp://arxiv.org/abs/2605.00691v1http://arxiv.org/abs/2605.00691v1Zi-Bo Qin et al. — arxiv:2605.00691 — Multi-AgentFri, 01 May 2026 00:00:00 GMTMulti-AgentDySRec: Dynamic Context-Aware Psychometric Scale Recommendation via Multi-Agent Collaborationhttp://arxiv.org/abs/2605.00574v1http://arxiv.org/abs/2605.00574v1Yanzeng Li et al. — arxiv:2605.00574 — Multi-AgentFri, 01 May 2026 00:00:00 GMTMulti-AgentHierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.00529v1http://arxiv.org/abs/2605.00529v1Ziwen Zhao et al. — arxiv:2605.00529 — Multi-AgentFri, 01 May 2026 00:00:00 GMTMulti-AgentSAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clustershttp://arxiv.org/abs/2605.00528v1http://arxiv.org/abs/2605.00528v1Dongxin Guo et al. — arxiv:2605.00528 — Multi-AgentFri, 01 May 2026 00:00:00 GMTMulti-AgentScaling Video Understanding via Compact Latent Multi-Agent Collaborationhttp://arxiv.org/abs/2605.00444v1http://arxiv.org/abs/2605.00444v1Kerui Chen et al. — arxiv:2605.00444 — Multi-AgentFri, 01 May 2026 00:00:00 GMTMulti-AgentWhen RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AIhttp://arxiv.org/abs/2605.00796v1http://arxiv.org/abs/2605.00796v1Alfredo Madrid-García et al. — arxiv:2605.00796 — RAGFri, 01 May 2026 00:00:00 GMTRAGBlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesishttp://arxiv.org/abs/2605.00632v1http://arxiv.org/abs/2605.00632v1Massimo Rondelli et al. — arxiv:2605.00632 — RAGFri, 01 May 2026 00:00:00 GMTRAGH-RAG at SemEval-2026 Task 8: Hierarchical Parent-Child Retrieval for Multi-Turn RAG Conversationshttp://arxiv.org/abs/2605.00631v1http://arxiv.org/abs/2605.00631v1Passant Elchafei et al. — arxiv:2605.00631 — RAGFri, 01 May 2026 00:00:00 GMTRAGHierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.00529v1http://arxiv.org/abs/2605.00529v1Ziwen Zhao et al. — arxiv:2605.00529 — RAGFri, 01 May 2026 00:00:00 GMTRAGLLM-Oriented Information Retrieval: A Denoising-First Perspectivehttp://arxiv.org/abs/2605.00505v1http://arxiv.org/abs/2605.00505v1Lu Dai et al. — arxiv:2605.00505 — RAGFri, 01 May 2026 00:00:00 GMTRAGCleanBase: Detecting Malicious Documents in RAG Knowledge Databaseshttp://arxiv.org/abs/2605.00460v1http://arxiv.org/abs/2605.00460v1Weifei Jin et al. — arxiv:2605.00460 — RAGFri, 01 May 2026 00:00:00 GMTRAGAgentic AI for Substance Use Education: Integrating Regulatory and Scientific Knowledge Sourceshttp://arxiv.org/abs/2605.00383v1http://arxiv.org/abs/2605.00383v1Kosar Haghani et al. — arxiv:2605.00383 — RAGFri, 01 May 2026 00:00:00 GMTRAGStructure-Aware Chunking for Tabular Data in Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.00318v1http://arxiv.org/abs/2605.00318v1Pooja Guttal et al. — arxiv:2605.00318 — RAGFri, 01 May 2026 00:00:00 GMTRAGEvaluating the Architectural Reasoning Capabilities of LLM Provers via the Obfuscated Natural Number Gamehttp://arxiv.org/abs/2605.00677v1http://arxiv.org/abs/2605.00677v1Lixing Li et al. — arxiv:2605.00677 — ReasoningFri, 01 May 2026 00:00:00 GMTReasoningThinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulationhttp://arxiv.org/abs/2605.00438v1http://arxiv.org/abs/2605.00438v1Jinkun Liu et al. — arxiv:2605.00438 — ReasoningFri, 01 May 2026 00:00:00 GMTReasoningSocial Bias in LLM-Generated Code: Benchmark and Mitigationhttp://arxiv.org/abs/2605.00382v1http://arxiv.org/abs/2605.00382v1Fazle Rabbi et al. — arxiv:2605.00382 — ReasoningFri, 01 May 2026 00:00:00 GMTReasoningResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learninghttp://arxiv.org/abs/2605.00380v1http://arxiv.org/abs/2605.00380v1Zihan Lin et al. — arxiv:2605.00380 — ReasoningFri, 01 May 2026 00:00:00 GMTReasoningTo Call or Not to Call: A Framework to Assess and Optimize LLM Tool Callinghttp://arxiv.org/abs/2605.00737v1http://arxiv.org/abs/2605.00737v1Qinyuan Wu et al. — arxiv:2605.00737 — Tool UseFri, 01 May 2026 00:00:00 GMTTool UseResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learninghttp://arxiv.org/abs/2605.00380v1http://arxiv.org/abs/2605.00380v1Zihan Lin et al. — arxiv:2605.00380 — Tool UseFri, 01 May 2026 00:00:00 GMTTool UseAgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?http://arxiv.org/abs/2605.00334v1http://arxiv.org/abs/2605.00334v1Ranit Karmakar et al. — arxiv:2605.00334 — Tool UseFri, 01 May 2026 00:00:00 GMTTool UseOn Aubry's completeness conjecturehttp://arxiv.org/abs/2605.00305v1http://arxiv.org/abs/2605.00305v1Tianqi Shi et al. — arxiv:2605.00305 — Tool UseFri, 01 May 2026 00:00:00 GMTTool UseA Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agentshttp://arxiv.org/abs/2605.01143v1http://arxiv.org/abs/2605.01143v1Sheldon Yu et al. — arxiv:2605.01143 — Tool UseFri, 01 May 2026 00:00:00 GMTTool UsePersistent Visual Memory: Sustaining Perception for Deep Generation in LVLMshttp://arxiv.org/abs/2605.00814v1http://arxiv.org/abs/2605.00814v1Siyuan Huang et al. — arxiv:2605.00814 — Multimodal LLMFri, 01 May 2026 00:00:00 GMTMultimodal LLMLet ViT Speak: Generative Language-Image Pre-traininghttp://arxiv.org/abs/2605.00809v1http://arxiv.org/abs/2605.00809v1Yan Fang et al. — arxiv:2605.00809 — Multimodal LLMFri, 01 May 2026 00:00:00 GMTMultimodal LLMGenerating Statistical Charts with Validation-Driven LLM Workflowshttp://arxiv.org/abs/2605.00800v1http://arxiv.org/abs/2605.00800v1Pavlin G. Poličar et al. — arxiv:2605.00800 — Multimodal LLMFri, 01 May 2026 00:00:00 GMTMultimodal LLMMake Your LVLM KV Cache More Lightweighthttp://arxiv.org/abs/2605.00789v1http://arxiv.org/abs/2605.00789v1Xihao Chen et al. — arxiv:2605.00789 — Multimodal LLMFri, 01 May 2026 00:00:00 GMTMultimodal LLMSTARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attackhttp://arxiv.org/abs/2605.00699v1http://arxiv.org/abs/2605.00699v1Xutao Mao et al. — arxiv:2605.00699 — Multimodal LLMFri, 01 May 2026 00:00:00 GMTMultimodal LLMIntrinsic Gradient Suppression for Label-Noise Prompt Tuning in Vision-Language Modelshttp://arxiv.org/abs/2605.00591v1http://arxiv.org/abs/2605.00591v1Jiayu Li et al. — arxiv:2605.00591 — Multimodal LLMFri, 01 May 2026 00:00:00 GMTMultimodal LLMJailbreaking Vision-Language Models Through the Visual Modalityhttp://arxiv.org/abs/2605.00583v1http://arxiv.org/abs/2605.00583v1Aharon Azulay et al. — arxiv:2605.00583 — Multimodal LLMFri, 01 May 2026 00:00:00 GMTMultimodal LLMLeveraging Vision-Language Models as Weak Annotators in Active Learninghttp://arxiv.org/abs/2605.00480v1http://arxiv.org/abs/2605.00480v1Phuong Ngoc Nguyen et al. — arxiv:2605.00480 — Multimodal LLMFri, 01 May 2026 00:00:00 GMTMultimodal LLMScaling Video Understanding via Compact Latent Multi-Agent Collaborationhttp://arxiv.org/abs/2605.00444v1http://arxiv.org/abs/2605.00444v1Kerui Chen et al. — arxiv:2605.00444 — Multimodal LLMFri, 01 May 2026 00:00:00 GMTMultimodal LLMThinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulationhttp://arxiv.org/abs/2605.00438v1http://arxiv.org/abs/2605.00438v1Jinkun Liu et al. — arxiv:2605.00438 — Multimodal LLMFri, 01 May 2026 00:00:00 GMTMultimodal LLMLearning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memoryhttp://arxiv.org/abs/2605.00702v1http://arxiv.org/abs/2605.00702v1Derong Xu et al. — arxiv:2605.00702 — Long ContextFri, 01 May 2026 00:00:00 GMTLong ContextScaling Video Understanding via Compact Latent Multi-Agent Collaborationhttp://arxiv.org/abs/2605.00444v1http://arxiv.org/abs/2605.00444v1Kerui Chen et al. — arxiv:2605.00444 — Long ContextFri, 01 May 2026 00:00:00 GMTLong ContextMemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agentshttp://arxiv.org/abs/2605.00356v1http://arxiv.org/abs/2605.00356v1Tianyu Hu et al. — arxiv:2605.00356 — Long ContextFri, 01 May 2026 00:00:00 GMTLong ContextBudget-Aware Routing for Long Clinical Texthttp://arxiv.org/abs/2605.00336v1http://arxiv.org/abs/2605.00336v1Khizar Qureshi et al. — arxiv:2605.00336 — Long ContextFri, 01 May 2026 00:00:00 GMTLong ContextThe structure of gauge invariant Gaussian quantum operations on finite Fermion systemshttp://arxiv.org/abs/2605.00784v1http://arxiv.org/abs/2605.00784v1Eric A. Carlen et al. — arxiv:2605.00784 — LLM EfficiencyFri, 01 May 2026 00:00:00 GMTLLM EfficiencyUniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priorshttp://arxiv.org/abs/2605.00658v1http://arxiv.org/abs/2605.00658v1Houyuan Chen et al. — arxiv:2605.00658 — LLM EfficiencyFri, 01 May 2026 00:00:00 GMTLLM EfficiencyBudget Constraints as Riemannian Manifoldshttp://arxiv.org/abs/2605.00649v1http://arxiv.org/abs/2605.00649v1Michael Helcig et al. — arxiv:2605.00649 — LLM EfficiencyFri, 01 May 2026 00:00:00 GMTLLM EfficiencyFaithful Extreme Image Rescaling with Learnable Reversible Transformation and Semantic Priorshttp://arxiv.org/abs/2605.00605v1http://arxiv.org/abs/2605.00605v1Hao Wei et al. — arxiv:2605.00605 — LLM EfficiencyFri, 01 May 2026 00:00:00 GMTLLM EfficiencyFast and Exact: Asymptotically Linear KL-Optimal Frequency Normalizationhttp://arxiv.org/abs/2605.00579v1http://arxiv.org/abs/2605.00579v1Kamila Szewczyk et al. — arxiv:2605.00579 — LLM EfficiencyFri, 01 May 2026 00:00:00 GMTLLM EfficiencyQuantum corrections to the Josephson dynamics: a population-imbalance approachhttp://arxiv.org/abs/2605.00571v1http://arxiv.org/abs/2605.00571v1Oliver Hideg et al. — arxiv:2605.00571 — LLM EfficiencyFri, 01 May 2026 00:00:00 GMTLLM EfficiencyAGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMshttp://arxiv.org/abs/2605.00539v1http://arxiv.org/abs/2605.00539v1Wenxiang Lin et al. — arxiv:2605.00539 — LLM EfficiencyFri, 01 May 2026 00:00:00 GMTLLM EfficiencySilicon Showdown: Performance, Efficiency, and Ecosystem Barriers in Consumer-Grade LLM Inferencehttp://arxiv.org/abs/2605.00519v1http://arxiv.org/abs/2605.00519v1Allan Kazakov et al. — arxiv:2605.00519 — LLM EfficiencyFri, 01 May 2026 00:00:00 GMTLLM EfficiencyBWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMshttp://arxiv.org/abs/2605.00422v1http://arxiv.org/abs/2605.00422v1Zhixiong Zhao et al. — arxiv:2605.00422 — LLM EfficiencyFri, 01 May 2026 00:00:00 GMTLLM EfficiencyRadLite: Multi-Task LoRA Fine-Tuning of Small Language Models for CPU-Deployable Radiology AIhttp://arxiv.org/abs/2605.00421v1http://arxiv.org/abs/2605.00421v1Pankaj Gupta et al. — arxiv:2605.00421 — LLM EfficiencyFri, 01 May 2026 00:00:00 GMTLLM EfficiencyDynamicPO: Dynamic Preference Optimization for Recommendationhttp://arxiv.org/abs/2605.00327v1http://arxiv.org/abs/2605.00327v1Xingyu Hu et al. — arxiv:2605.00327 — AlignmentFri, 01 May 2026 00:00:00 GMTAlignmentIterative Finetuning is Mostly Idempotenthttp://arxiv.org/abs/2605.01130v1http://arxiv.org/abs/2605.01130v1Zephaniah Roe et al. — arxiv:2605.01130 — AlignmentFri, 01 May 2026 00:00:00 GMTAlignmentPERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMshttp://arxiv.org/abs/2605.01123v1http://arxiv.org/abs/2605.01123v1Ravi Ranjan et al. — arxiv:2605.01123 — AlignmentFri, 01 May 2026 00:00:00 GMTAlignmentWhen LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Modelshttp://arxiv.org/abs/2605.00817v1http://arxiv.org/abs/2605.00817v1Sailesh Panda et al. — arxiv:2605.00817 — HallucinationFri, 01 May 2026 00:00:00 GMTHallucinationA Geometric Interpretation of Generalized Hurwitz--Radon Numbers Defined by Kannaka--Tojohttp://arxiv.org/abs/2605.00704v1http://arxiv.org/abs/2605.00704v1Muneto Miyaji et al. — arxiv:2605.00704 — HallucinationFri, 01 May 2026 00:00:00 GMTHallucinationFrom Prediction to Practice: A Task-Aware Evaluation Framework for Blood Glucose Forecastinghttp://arxiv.org/abs/2605.00645v1http://arxiv.org/abs/2605.00645v1Alireza Namazi et al. — arxiv:2605.00645 — HallucinationFri, 01 May 2026 00:00:00 GMTHallucinationClass Angular Distortion Index for Dimensionality Reductionhttp://arxiv.org/abs/2605.00637v1http://arxiv.org/abs/2605.00637v1Kaviru Gunaratne et al. — arxiv:2605.00637 — HallucinationFri, 01 May 2026 00:00:00 GMTHallucinationH-RAG at SemEval-2026 Task 8: Hierarchical Parent-Child Retrieval for Multi-Turn RAG Conversationshttp://arxiv.org/abs/2605.00631v1http://arxiv.org/abs/2605.00631v1Passant Elchafei et al. — arxiv:2605.00631 — HallucinationFri, 01 May 2026 00:00:00 GMTHallucinationFaithful Extreme Image Rescaling with Learnable Reversible Transformation and Semantic Priorshttp://arxiv.org/abs/2605.00605v1http://arxiv.org/abs/2605.00605v1Hao Wei et al. — arxiv:2605.00605 — HallucinationFri, 01 May 2026 00:00:00 GMTHallucinationLLM-Oriented Information Retrieval: A Denoising-First Perspectivehttp://arxiv.org/abs/2605.00505v1http://arxiv.org/abs/2605.00505v1Lu Dai et al. — arxiv:2605.00505 — HallucinationFri, 01 May 2026 00:00:00 GMTHallucinationFrom Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Modelshttp://arxiv.org/abs/2605.00474v1http://arxiv.org/abs/2605.00474v1Yearim Kim et al. — arxiv:2605.00474 — HallucinationFri, 01 May 2026 00:00:00 GMTHallucinationReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?http://arxiv.org/abs/2605.00468v1http://arxiv.org/abs/2605.00468v1Joey Chan et al. — arxiv:2605.00468 — HallucinationFri, 01 May 2026 00:00:00 GMTHallucinationLIMSSR: LLM-Driven Sequence-to-Score Reasoning under Training-Time Incomplete Multimodal Observationshttp://arxiv.org/abs/2605.00434v1http://arxiv.org/abs/2605.00434v1Huangbiao Xu et al. — arxiv:2605.00434 — HallucinationFri, 01 May 2026 00:00:00 GMTHallucinationFinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarioshttp://arxiv.org/abs/2605.00706v1http://arxiv.org/abs/2605.00706v1Yutao Hou et al. — arxiv:2605.00706 — LLM SafetyFri, 01 May 2026 00:00:00 GMTLLM SafetySTARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attackhttp://arxiv.org/abs/2605.00699v1http://arxiv.org/abs/2605.00699v1Xutao Mao et al. — arxiv:2605.00699 — LLM SafetyFri, 01 May 2026 00:00:00 GMTLLM SafetyJailbreaking Vision-Language Models Through the Visual Modalityhttp://arxiv.org/abs/2605.00583v1http://arxiv.org/abs/2605.00583v1Aharon Azulay et al. — arxiv:2605.00583 — LLM SafetyFri, 01 May 2026 00:00:00 GMTLLM SafetyStable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balancehttp://arxiv.org/abs/2605.00553v1http://arxiv.org/abs/2605.00553v1Minchan Kwon et al. — arxiv:2605.00553 — LLM SafetyFri, 01 May 2026 00:00:00 GMTLLM SafetyDisciplined Diffusion: Text-to-Image Diffusion Model against NSFW Generationhttp://arxiv.org/abs/2605.01113v1http://arxiv.org/abs/2605.01113v1Chi Zhang et al. — arxiv:2605.01113 — LLM SafetyFri, 01 May 2026 00:00:00 GMTLLM SafetySRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreakinghttp://arxiv.org/abs/2605.00974v1http://arxiv.org/abs/2605.00974v1Jindong Li et al. — arxiv:2605.00974 — LLM SafetyFri, 01 May 2026 00:00:00 GMTLLM SafetyAgent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelineshttp://arxiv.org/abs/2605.00410v1http://arxiv.org/abs/2605.00410v1Aninda Ray et al. — arxiv:2605.00410 — LLM EvaluationFri, 01 May 2026 00:00:00 GMTLLM EvaluationNegative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.comhttp://arxiv.org/abs/2605.00353v1http://arxiv.org/abs/2605.00353v1Eva Agapaki et al. — arxiv:2605.00353 — LLM EvaluationFri, 01 May 2026 00:00:00 GMTLLM EvaluationRunAgent: Interpreting Natural-Language Plans with Constraint-Guided Executionhttp://arxiv.org/abs/2605.00798v1http://arxiv.org/abs/2605.00798v1Arunabh Srivastava et al. — arxiv:2605.00798 — Code LLMFri, 01 May 2026 00:00:00 GMTCode LLMThemis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoringhttp://arxiv.org/abs/2605.00754v1http://arxiv.org/abs/2605.00754v1Indraneil Paul et al. — arxiv:2605.00754 — Code LLMFri, 01 May 2026 00:00:00 GMTCode LLMImproving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learninghttp://arxiv.org/abs/2605.00433v1http://arxiv.org/abs/2605.00433v1Shouyu Yin et al. — arxiv:2605.00433 — Code LLMFri, 01 May 2026 00:00:00 GMTCode LLMSocial Bias in LLM-Generated Code: Benchmark and Mitigationhttp://arxiv.org/abs/2605.00382v1http://arxiv.org/abs/2605.00382v1Fazle Rabbi et al. — arxiv:2605.00382 — Code LLMFri, 01 May 2026 00:00:00 GMTCode LLMML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Modelshttp://arxiv.org/abs/2605.00689v1http://arxiv.org/abs/2605.00689v1Yunhan Zhao et al. — arxiv:2605.00689 — Legal NLPFri, 01 May 2026 00:00:00 GMTLegal NLPTeaching LLMs Brazilian Healthcare: Injecting Knowledge from Official Clinical Guidelineshttp://arxiv.org/abs/2605.01077v1http://arxiv.org/abs/2605.01077v1Hugo Abonizio et al. — arxiv:2605.01077 — Medical NLPFri, 01 May 2026 00:00:00 GMTMedical NLPThemis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoringhttp://arxiv.org/abs/2605.00754v2http://arxiv.org/abs/2605.00754v2Indraneil Paul et al. — arxiv:2605.00754 — Multilingual NLPFri, 01 May 2026 00:00:00 GMTMultilingual NLPSC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Modelshttp://arxiv.org/abs/2605.00620v1http://arxiv.org/abs/2605.00620v1Shiqiang Cai et al. — arxiv:2605.00620 — Multilingual NLPFri, 01 May 2026 00:00:00 GMTMultilingual NLPMMAudio-LABEL: Audio Event Labeling via Audio Generation for Silent Videohttp://arxiv.org/abs/2605.00495v1http://arxiv.org/abs/2605.00495v1Kazuya Tateishi et al. — arxiv:2605.00495 — Text ClassificationFri, 01 May 2026 00:00:00 GMTText ClassificationA Sentence Relation-Based Approach to Sanitizing Malicious Instructionshttp://arxiv.org/abs/2605.01078v1http://arxiv.org/abs/2605.01078v1Soumil Datta et al. — arxiv:2605.01078 — Text ClassificationFri, 01 May 2026 00:00:00 GMTText ClassificationGenerating Statistical Charts with Validation-Driven LLM Workflowshttp://arxiv.org/abs/2605.00800v1http://arxiv.org/abs/2605.00800v1Pavlin G. Poličar et al. — arxiv:2605.00800 — Question AnsweringFri, 01 May 2026 00:00:00 GMTQuestion AnsweringHierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generationhttp://arxiv.org/abs/2605.00529v1http://arxiv.org/abs/2605.00529v1Ziwen Zhao et al. — arxiv:2605.00529 — Question AnsweringFri, 01 May 2026 00:00:00 GMTQuestion AnsweringThe Power of Order: Fooling LLMs with Adversarial Table Permutationshttp://arxiv.org/abs/2605.00445v1http://arxiv.org/abs/2605.00445v1Xinshuai Dong et al. — arxiv:2605.00445 — Question AnsweringFri, 01 May 2026 00:00:00 GMTQuestion AnsweringMemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agentshttp://arxiv.org/abs/2605.00356v1http://arxiv.org/abs/2605.00356v1Tianyu Hu et al. — arxiv:2605.00356 — Question AnsweringFri, 01 May 2026 00:00:00 GMTQuestion AnsweringDirected Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Mediahttp://arxiv.org/abs/2605.00776v1http://arxiv.org/abs/2605.00776v1Scott Friedman et al. — arxiv:2605.00776 — Sentiment AnalysisFri, 01 May 2026 00:00:00 GMTSentiment AnalysisARIS: Agentic and Relationship Intelligence System for Social Robotshttp://arxiv.org/abs/2605.00943v1http://arxiv.org/abs/2605.00943v1Stavya Datta et al. — arxiv:2605.00943 — Knowledge GraphFri, 01 May 2026 00:00:00 GMTKnowledge GraphNLPOpt-Net: A Learning Method for Nonlinear Optimization with Feasibility Guaranteeshttp://arxiv.org/abs/2605.00260v1http://arxiv.org/abs/2605.00260v1Bimol Nath Roy et al. — arxiv:2605.00260 — NLPThu, 30 Apr 2026 00:00:00 GMTNLPEstimating LLM Grading Ability and Response Difficulty in Automatic Short Answer Grading via Item Response Theoryhttp://arxiv.org/abs/2605.00238v1http://arxiv.org/abs/2605.00238v1Longwei Cong et al. — arxiv:2605.00238 — NLPThu, 30 Apr 2026 00:00:00 GMTNLPNorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpushttp://arxiv.org/abs/2605.00086v1http://arxiv.org/abs/2605.00086v1Enzo S. N. Silva et al. — arxiv:2605.00086 — NLPThu, 30 Apr 2026 00:00:00 GMTNLPRetrieval-Augmented Reasoning for Chartered Accountancyhttp://arxiv.org/abs/2605.00257v1http://arxiv.org/abs/2605.00257v1Jatin Gupta et al. — arxiv:2605.00257 — RAGThu, 30 Apr 2026 00:00:00 GMTRAGRetrieval-Augmented Reasoning for Chartered Accountancyhttp://arxiv.org/abs/2605.00257v1http://arxiv.org/abs/2605.00257v1Jatin Gupta et al. — arxiv:2605.00257 — ReasoningThu, 30 Apr 2026 00:00:00 GMTReasoningTUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimizationhttp://arxiv.org/abs/2605.00224v1http://arxiv.org/abs/2605.00224v1Abdulhady Abas Abdullah et al. — arxiv:2605.00224 — ReasoningThu, 30 Apr 2026 00:00:00 GMTReasoningAre Tools All We Need? Unveiling the Tool-Use Tax in LLM Agentshttp://arxiv.org/abs/2605.00136v1http://arxiv.org/abs/2605.00136v1Kaituo Zhang et al. — arxiv:2605.00136 — Tool UseThu, 30 Apr 2026 00:00:00 GMTTool UseTADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Datahttp://arxiv.org/abs/2605.00060v1http://arxiv.org/abs/2605.00060v1Rong Lu et al. — arxiv:2605.00060 — Tool UseThu, 30 Apr 2026 00:00:00 GMTTool UseTUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimizationhttp://arxiv.org/abs/2605.00224v1http://arxiv.org/abs/2605.00224v1Abdulhady Abas Abdullah et al. — arxiv:2605.00224 — Long ContextThu, 30 Apr 2026 00:00:00 GMTLong ContextNorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpushttp://arxiv.org/abs/2605.00086v1http://arxiv.org/abs/2605.00086v1Enzo S. N. Silva et al. — arxiv:2605.00086 — Long ContextThu, 30 Apr 2026 00:00:00 GMTLong ContextAttention Is Where You Attackhttp://arxiv.org/abs/2605.00236v1http://arxiv.org/abs/2605.00236v1Aviral Srivastava et al. — arxiv:2605.00236 — AlignmentThu, 30 Apr 2026 00:00:00 GMTAlignmentTUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimizationhttp://arxiv.org/abs/2605.00224v1http://arxiv.org/abs/2605.00224v1Abdulhady Abas Abdullah et al. — arxiv:2605.00224 — AlignmentThu, 30 Apr 2026 00:00:00 GMTAlignmentWasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedbackhttp://arxiv.org/abs/2605.00155v1http://arxiv.org/abs/2605.00155v1Yikai Wang et al. — arxiv:2605.00155 — AlignmentThu, 30 Apr 2026 00:00:00 GMTAlignmentHow Language Models Process Out-of-Distribution Inputs: A Two-Pathway Frameworkhttp://arxiv.org/abs/2605.00269v1http://arxiv.org/abs/2605.00269v1Hamidreza Saghir et al. — arxiv:2605.00269 — LLM SafetyThu, 30 Apr 2026 00:00:00 GMTLLM SafetyJailbroken Frontier Models Retain Their Capabilitieshttp://arxiv.org/abs/2605.00267v1http://arxiv.org/abs/2605.00267v1Daniel Zhu et al. — arxiv:2605.00267 — LLM SafetyThu, 30 Apr 2026 00:00:00 GMTLLM SafetyAttention Is Where You Attackhttp://arxiv.org/abs/2605.00236v1http://arxiv.org/abs/2605.00236v1Aviral Srivastava et al. — arxiv:2605.00236 — LLM SafetyThu, 30 Apr 2026 00:00:00 GMTLLM SafetyMinimal, Local, Causal Explanations for Jailbreak Success in Large Language Modelshttp://arxiv.org/abs/2605.00123v1http://arxiv.org/abs/2605.00123v1Shubham Kumar et al. — arxiv:2605.00123 — LLM SafetyThu, 30 Apr 2026 00:00:00 GMTLLM SafetyARMOR 2025: A Military-Aligned Benchmark for Evaluating Large Language Model Safety Beyond Civilian Contextshttp://arxiv.org/abs/2605.00245v1http://arxiv.org/abs/2605.00245v1Sydney Johns et al. — arxiv:2605.00245 — LLM EvaluationThu, 30 Apr 2026 00:00:00 GMTLLM EvaluationTUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimizationhttp://arxiv.org/abs/2605.00224v1http://arxiv.org/abs/2605.00224v1Abdulhady Abas Abdullah et al. — arxiv:2605.00224 — LLM EvaluationThu, 30 Apr 2026 00:00:00 GMTLLM EvaluationHow Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responseshttp://arxiv.org/abs/2605.00113v1http://arxiv.org/abs/2605.00113v1Ishan Gupta et al. — arxiv:2605.00113 — LLM EvaluationThu, 30 Apr 2026 00:00:00 GMTLLM EvaluationCRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shifthttp://arxiv.org/abs/2605.00074v1http://arxiv.org/abs/2605.00074v1Najmul Hasan et al. — arxiv:2605.00074 — LLM EvaluationThu, 30 Apr 2026 00:00:00 GMTLLM EvaluationViLegalNLI: Natural Language Inference for Vietnamese Legal Textshttp://arxiv.org/abs/2605.00116v1http://arxiv.org/abs/2605.00116v1Nhung Thi-Hong Duong et al. — arxiv:2605.00116 — Legal NLPThu, 30 Apr 2026 00:00:00 GMTLegal NLPSequential Measurements as a Resource for Quantum Metrologyhttp://arxiv.org/abs/2605.00287v1http://arxiv.org/abs/2605.00287v1Koray Mentesoglu et al. — arxiv:2605.00287 — Information ExtractionThu, 30 Apr 2026 00:00:00 GMTInformation ExtractionViLegalNLI: Natural Language Inference for Vietnamese Legal Textshttp://arxiv.org/abs/2605.00116v1http://arxiv.org/abs/2605.00116v1Nhung Thi-Hong Duong et al. — arxiv:2605.00116 — Text ClassificationThu, 30 Apr 2026 00:00:00 GMTText ClassificationTUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimizationhttp://arxiv.org/abs/2605.00224v1http://arxiv.org/abs/2605.00224v1Abdulhady Abas Abdullah et al. — arxiv:2605.00224 — Question AnsweringThu, 30 Apr 2026 00:00:00 GMTQuestion Answering