Page 75 of 149

AllHigh signalRecent
5934 articles
arXiv cs.AI·

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Vision-OPD introduces regional-to-global self-distillation to improve fine-grained visual understanding in MLLMs. The framework transfers the model's privileged perception on evidence-centered crops to its full-image policy via KL divergence minimization between token distributions. Competitive results on fine-grained visual understanding benchmarks without external models or ground-truth labels.

VisionReinforcement learningBenchmarks
SIG
72
HYP
18
arXiv cs.CL·

Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models

Multilingual OCR-aware fine-tuning framework for MLLMs combining synthetic OCR-to-translation data generation, LoRA-based SFT, and structured visual chain-of-thought reasoning. Significantly improves extraction of small, blurred, occluded text on receipts, menus, documents under degraded visual conditions. Outperforms GPT-5 and Gemini on OCR grounding and hallucination reduction.

VisionReasoningFine-tuning
SIG
72
HYP
28
arXiv cs.AI·

VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation

VISAFF is a framework for Emotion Recognition in Conversation (ERC) using vision-language models. It combines two stages: speaker-centered affective grounding and reliability-guided affective complementation. The tuning-free approach leverages frozen VLMs' reasoning capabilities, integrating visual, textual, and acoustic signals to improve accuracy without expensive fine-tuning.

VisionMulti-agentPapers
SIG
72
HYP
25
arXiv cs.AI·

DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents

DiagEval is a trajectory-conditioned diagnostic evaluation protocol for GUI agents testing LLM-generated interactive software. It reuses failed trajectories to determine whether failures stem from the evaluator or the software itself. On WebDevJudge-Unit and RealDevBench, DiagEval recovers 45.6-62.1% of false negatives and improves accuracy from 69.9% to 78.3% and from 65.0% to 81.6%.

AI AgentsEvalsCode generation
SIG
72
HYP
18
arXiv cs.AI·

Progressive Generalization Augmentation with Deeply Coupled RND-PPO and Domain-Prioritized Noise Injection for Robust Crop Management Reinforcement Learning

arXiv paper introducing Progressive Generalization Augmentation (PGA) to improve robustness of agricultural RL systems. Coupled RND-PPO architecture + hierarchical noise injection. Results: +8.43% yield, +16.42% nitrogen use efficiency vs BERT-DQN in Florida; 94.4% performance retention under combined perturbations.

Reinforcement learningPapersBenchmarks
SIG
72
HYP
28
arXiv cs.AI·

FLAG: Foundation model representation with Latent diffusion Alignment via Graph for spatial gene expression prediction

FLAG is a latent diffusion framework for predicting spatial gene expression from H&E images. It integrates a spatial graph encoder and Gene Foundation Model alignment to address the Gene Dimension Curse and preserve biological relationships (gene coordination, spatial distribution). Introduces novel structural evaluation metrics: GSC and SSC.

PapersVisionReasoning
SIG
72
HYP
18
arXiv cs.AI·

Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment

New arXiv paper proposing HRC (Hybrid Reward-Cyclic), a reward model explicitly decomposing human preferences into transitive (scalar) and cyclic (vector) components via game theory. Introduces DSPPO (Dynamic Self-Play Preference Optimization) for alignment. Results: +1.23% on RewardBench 2 vs GPM, 44.75% win-rate on AlpacaEval 2.0 with Gemma-2B-it.

Reinforcement learningAlignmentPapers
SIG
72
HYP
25
arXiv cs.AI·

Learning Higher-Order Structure from Incomplete Spatiotemporal Data: Multi-Scale Hypergraph Laplacians with Neural Refinement

Multi-Scale Hypergraph Laplacians (MSHL): two-stage framework for imputing incomplete spatiotemporal sensor network data. Discovers higher-order structure via multi-scale hypergraphs, then refines with hypergraph-conditioned residual network. Theoretical guarantees and evaluation on real traffic networks with structured outages.

PapersBenchmarksInfrastructure
SIG
72
HYP
15
arXiv cs.AI·

MetaCogAgent: A Metacognitive Multi-Agent LLM Framework with Self-Aware Task Delegation

MetaCogAgent is a multi-agent LLM framework where each agent evaluates task-capability alignment via a Metacognitive Self-Assessment Unit before execution. The system combines verbalized uncertainty and historical capability profiles to route tasks to best-suited agents. On MetaCog-Eval benchmark (700 tasks), it achieves 82.4% accuracy (+8.7% vs baselines) with 5% fewer API calls than AutoGen.

Multi-agentAI AgentsReasoning
SIG
72
HYP
28
arXiv cs.CL·

Mitigating Extrinsic Gender Bias for Bangla Classification Tasks

Investigation of extrinsic gender bias in Bangla pretrained language models. Four manually annotated task-specific benchmark datasets constructed (sentiment analysis, toxicity detection, hate speech, sarcasm detection) with minimal-pair gender perturbations. RandSymKL debiasing strategy proposed, combining symmetric KL divergence and cross-entropy loss. Implementation and datasets publicly released.

BenchmarksAI safetyAlignment
SIG
72
HYP
15
arXiv cs.AI·

Multi-Paradigm Agent Interaction in Practice:A Systematic Analysis of Generator-Evaluator, ReAct Loop,and Adversarial Evaluation in the buddyMe Framework

buddyMe, open-source multi-model framework, integrates three agent interaction paradigms: multi-agent orchestration (Generator-Evaluator), ReAct loops, memory-augmented interaction. Five-stage pipeline tested on 4 real cases (museum guides, weather, tour planning). Results: 20% requirement omission detection, 30% redundant tool invocations, adversarial consensus in 2-3 rounds (70% scenarios).

AI AgentsMulti-agentReasoning
SIG
72
HYP
28
arXiv cs.AI·

Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models

Alice is an online executable world-model learning system that discovers environment dynamics without rule descriptions or reward signals. The agent induces transition laws from interaction alone, treating preservation conflicts as structural signal to refine hypothesis classes. Evaluation on Baba in Wonderland shows substantial improvement under prior misalignment.

ReasoningReinforcement learningPapers
SIG
72
HYP
15