Page 76 of 149

AllHigh signalRecent
5935 articles
arXiv cs.CL·

AI Slop or AI-enhancement? Student perceptions of AI-generated media for an English for Academic Purposes course

Implementation study of Google Notebook LM generating videos, podcasts, and infographics in an English for Academic Purposes course (106 students, Hong Kong). Students rated high perceived usefulness and ease of use; preference for visual/multimodal content. Positive correlation between video preference and academic performance, but higher cognitive load negatively associated with grades.

RAGToolsEvals
SIG
72
HYP
25
arXiv cs.AI·

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

PluRule is a multimodal, multilingual benchmark for moderating pluralistic communities on social media. It covers 13,371 rule violations across 1,989 Reddit communities and 2,885 rules in 9 languages. State-of-the-art vision-language models, including GPT-4.5 with advanced reasoning, only marginally outperform a trivial baseline, revealing that pluralistic moderation remains a fundamental challenge.

BenchmarksVisionAI safety
SIG
72
HYP
25
arXiv cs.AI·

HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems

HAAS is a framework for adaptive task allocation between humans and AI systems in software engineering and manufacturing. It combines rule-based governance constraints with contextual-bandit learning. Results show governance is not binary but a tunable design variable: moderate governance improves operational performance and reduces fatigue in manufacturing while remaining competitive as the learner gains experience.

AI AgentsMulti-agentReinforcement learning
SIG
72
HYP
18
arXiv cs.AI·

Visual Timelines of Police Encounters in Body-Worn Camera Footage: Operational Context and Activity Cataloging for Training and Analysis in OpenBWC

Approach to process body-worn camera (BWC) video into 10-second windows labeled by operational context and motion intensity. Models trained with CLIP and optical flow: 78.75% accuracy for context, 88.33% for activity. Privacy-conscious protocol to speed up incident review and officer training workflows.

VisionBenchmarksAI safety
SIG
72
HYP
15
arXiv cs.AI·

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

Behavior Foundation Models (BFMs) enable scalable imitation learning but fail under dynamics shifts (friction, actuation, noise). This paper formulates BFM task-inference as robust minimax optimization, enabling adaptation to worst-case dynamics perturbations without retraining. The framework outperforms standard BFM and robust offline IL baselines under dynamics shifts.

Reinforcement learningPapersEvals
SIG
72
HYP
18
arXiv cs.AI·

Agents for Experiments, Experiments for Agents: A Design Grammar for AI-Enabled Experimental Science

SEED is a framework representing experimental conditions as typed actor-flow graphs to study multi-agent systems and human-AI workflows. It enables describing conditions, evaluating structural novelty, and generating candidate designs under constraints. Empirical test on medical-triage task shows SEED-guided designs provide clearer interaction changes, assumptions, and governance checks.

AI AgentsMulti-agentEvals
SIG
72
HYP
18
arXiv cs.AI·

SAFE-SVD: Sensitivity-Aware Fidelity-Enforcing SVD for Physics Foundation Models

SAFE-SVD proposes a compression method for physics foundation models (PFMs) that preserves physical fidelity. The technique models layer sensitivity in the output function space, avoiding severe performance degradation caused by conventional methods. Experiments show substantial gains in compression ratios while maintaining accuracy across multiple models and datasets.

PapersBenchmarksInfrastructure
SIG
72
HYP
28