Page 110 of 151

AllHigh signalRecent

6029 articles

Meta-learning for wrestling

OpenAI demonstrates that a meta-learning agent can quickly learn to defeat a stronger non-meta-learning opponent in simulated robot wrestling and adapt to physical malfunctions.

Reinforcement learning Robotics Papers

SIG

HYP

OpenAI Blog·Sep 13

Learning with opponent-learning awareness

OpenAI introduces a reinforcement learning method where agents model opponent learning to improve strategy. Tested in multi-agent environments, this approach enables models to adapt behavior by anticipating adversary changes.

OpenAI Reinforcement learning Multi-agent

SIG

HYP

OpenAI Blog·Jul 1

Teacher–student curriculum learning

OpenAI introduces a teacher-student curriculum learning approach where a teacher model generates progressively harder tasks to train a student model. The method improves learning efficiency by adapting training example difficulty to the student model's skill level.

OpenAI Reinforcement learning

SIG

HYP

OpenAI Blog·Jun 8

Learning to cooperate, compete, and communicate

OpenAI explores multiagent environments where agents compete for resources as stepping stones toward AGI. These environments provide natural curriculum (difficulty matched to competitor skill) and no stable equilibrium, creating constant pressure for improvement.

Multi-agent AI Agents Reinforcement learning

SIG

HYP

OpenAI Blog·Apr 10

Stochastic Neural Networks for hierarchical reinforcement learning

OpenAI publishes research on stochastic neural networks for hierarchical reinforcement learning. The method improves agents' ability to decompose complex tasks into sub-objectives.

OpenAI Reinforcement learning Papers

SIG

HYP

OpenAI Blog·Apr 1

Spam detection in the physical world

OpenAI has developed a spam-detection AI system trained entirely in simulation and deployed on a physical robot. First application of its kind capable of operating in the real world.

OpenAI Robotics Reinforcement learning

SIG

HYP

OpenAI Blog·Mar 21

One-shot imitation learning

OpenAI introduces one-shot imitation learning, enabling models to learn from a single demonstration without additional training. The method applies to robotics and control tasks.

OpenAI Reinforcement learning Robotics

SIG

HYP

OpenAI Blog·Mar 16

Learning to communicate

OpenAI publishes research on agents developing their own language. Agents learn to communicate with each other through an emergent protocol without explicit human supervision.

OpenAI AI Agents Multi-agent

SIG

HYP

OpenAI Blog·Mar 12

Prediction and control with temporal segment models

OpenAI introduces temporal segment models (TSM), models capable of predicting and controlling complex temporal sequences. These models segment data into temporal intervals to improve prediction and control in dynamic environments.

OpenAI Reasoning Benchmarks

SIG

HYP

OpenAI Blog·Feb 24

Attacking machine learning with adversarial examples

OpenAI explores adversarial examples, inputs intentionally designed to fool ML models. The post demonstrates how they work across different mediums and discusses challenges in securing systems against such attacks.

AI safety Alignment OpenAI

SIG

HYP

OpenAI Blog·Feb 8

Adversarial attacks on neural network policies

OpenAI publishes research on adversarial attacks against neural network policies. The study examines how AI models can be manipulated by malicious inputs and proposes defense methods.

OpenAI AI safety Papers

SIG

HYP

OpenAI Blog·Dec 21

Faulty reward functions in the wild

OpenAI analyzes failures of reward functions in reinforcement learning. The article explores how misspecifying the reward function can cause unexpected and counterintuitive behaviors in RL algorithms.

Reinforcement learning Alignment AI safety

SIG

HYP

OpenAI Blog·Nov 15

OpenAI and Microsoft

OpenAI and Microsoft expand their partnership: OpenAI will now run most of its large-scale experiments on Microsoft's Azure infrastructure.

OpenAI Infrastructure

SIG

HYP

OpenAI Blog·Oct 11

Transfer from simulation to real world through learning deep inverse dynamics model

OpenAI develops a deep inverse dynamics model learning approach to transfer simulation-trained policies to real-world robots. The method reduces real-world data requirements by learning to predict actions from observations, improving generalization of simulation-trained policies.

Robotics Reinforcement learning Papers

SIG

HYP

OpenAI Blog·May 25

Adversarial training methods for semi-supervised text classification

OpenAI presents adversarial training methods for semi-supervised text classification. The approach combines labeled and unlabeled data to improve model robustness against adversarial perturbations.

OpenAI AI safety Papers

SIG

HYP

arXiv cs.AI·2d ago

Severity-Aware Curriculum Learning with Multi-Model Response Selection for Medical Text Generation

Multi-model framework with severity-aware curriculum learning for medical text generation. Three-stage progressive training (mild → moderate → critical cases) across 5 LLMs, relevance-based response selection at inference. MAQA dataset evaluation: 86.71% baseline, 90.30% after fine-tuning (BERTScore).

Fine-tuning Reinforcement learning Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·2d ago

A lightweight agent embedded in your terminal

agent-sh is a shell with an embedded lightweight AI agent accessible via > key. Provides contextual awareness for quick terminal problems (rsync flags, diagnostics) without overhead. New command-suggest extension helps generate commands. npm install, works with local models.

AI Agents Code generation Tools

SIG

HYP

arXiv cs.CL·3d ago

MASF: A Multi-Model Adaptive Selection Framework for Abstractive Text summarization

Multi-model adaptive framework for abstractive text summarization. Integrates multiple fine-tuned transformers on CNN/DailyMail, selects best summary via automatic metrics (BERTScore 88.63%). Outperforms GPT3-D2, Falcon-7b, Mpt-7b.

Benchmarks Fine-tuning Evals

SIG

HYP

arXiv cs.CL·3d ago

Generic Triple-Latent Compression with Gated Associative Retrieval

Study of triple-latent sequence models maintaining running token state and compressed pair-memory pathway to capture higher-order token interactions. Improvements on byte-level WikiText-2 and MiniMind benchmark, with gated associative retrieval extension improving recall but remaining seed-sensitive and slow.

Papers Benchmarks Reasoning

SIG

HYP

Reddit r/LocalLLaMA·3d ago

Qwen 3.6 27B 30GB Same top p: 98.358 ± 0.033 % vs UD Q8 K XL 33GB Same top p: 97.426 ± 0.041 %

Custom quantization experiment on Qwen 3.6 27B: BF16→Q8_0 conversion targeting high-variance layers. Q8-CC model (30.47 GiB) achieves 98.358% vs UD Q8_K_XL (33.31 GiB) at 97.426% on wiki.test.raw. Mean KLD: 0.011324 vs 0.012100. Preliminary results without real-world performance benchmarks.

Qwen Open source Benchmarks

SIG

HYP

Reddit r/MachineLearning·3d ago

Faithful uncertainty in LLM agents: calibration vs utility tradeoff in practice[D]

Researcher tests uncertainty calibration in LLM agents using planning + verification pipeline. Verification catches 60% of hallucinated tool calls before execution, but reduces easy correct answers by half. Solution: flag low-confidence tasks for human review, auto-execute high-confidence ones.

AI Agents Reasoning AI safety

SIG

HYP

arXiv cs.LG·5d ago

Multi-Modal Machine Learning for Breast Cancer Recurrence Prediction

arXiv study on breast cancer recurrence prediction using multi-modal machine learning. Integrates treatment records, pathology reports, and clinician notes. Uses regex-based extraction and conflict reconciliation to recover tumor characteristics from free-text narratives. Shows multi-modal integration consistently improves predictive accuracy over single-modal methods.

Benchmarks Vision

SIG

HYP

arXiv cs.LG·5d ago

CL-DMDF:Dynamic Multimodal Data Fusion Model Based on Contrastive Learning

CL-DMDF introduces a dynamic multimodal data fusion model using contrastive learning to handle missing or uncertain modalities. It features a dual-dimension attention mechanism (features and modalities) and entity-centroid contrastive learning module for enhanced discrimination. Validated across three datasets.

Embeddings Papers

SIG

HYP

Reddit r/LocalLLaMA·May 31

Experiment : MTP models just as t/s efficient as non MTP models?

Benchmark on 9070XT GPU: Qwen 35B A3B MTP achieves 43.74 T/s vs 38.07 T/s standard mode. MTP shows ~15% throughput gain despite multi-token prediction overhead. Identical test conditions (prompt, 8192 context, Q4_K_XL quantization).

Qwen Benchmarks Code generation

SIG

HYP

arXiv cs.AI·May 28

Operational AI Deployment Assurance: Governance-State Orchestration Under Threshold-Sensitive Deployment Conditions -- A Governance Framework for High-Stakes AI Systems

OADA is a governance framework for high-stakes AI systems that translates fairness metric instability, threshold sensitivity, and operational uncertainty into deployment-oriented assurance decisions. Tested on facial recognition and healthcare, it introduces Deployment Assurance Scores, escalation states, and Threshold Stability Zones to actively govern deployment readiness rather than rely on post-hoc auditing.

AI safety Alignment Evals

SIG

HYP

arXiv cs.LG·May 27

Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention

Two complementary mechanisms improve transformer attention: Energy-Gated Attention (EGA) selects informative tokens via linear projection; Morlet Positional Encoding (MoPE) replaces sinusoidal encodings with learned Gaussian wavelets. On TinyShakespeare, their combination achieves +0.119 validation loss improvement, exceeding the sum of individual parts.

Papers Reasoning

SIG

HYP

arXiv cs.CL·May 26

Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers

Novel sparse attention approach using grammatical roles (POS tags) to reduce quadratic complexity of Transformers. Two masking strategies tested on SST-2 with DistilBERT: hard mask (0.8200) and soft mask (0.8165) maintain full attention performance (0.8200) while reducing computational overhead.

Reasoning Evals Papers

SIG

HYP

arXiv cs.LG·May 22

Temporal Contrastive Transformer for Financial Crime Detection: Self-Supervised Sequence Embeddings via Predictive Contrastive Coding

Temporal Contrastive Transformer (TCT): self-supervised representation learning framework for financial fraud detection via transaction sequence embeddings. AUC 0.8644 standalone, 0.9245 combined with engineered features. Captures temporal structure but no additive gain over baseline.

Papers Embeddings Reinforcement learning

SIG

HYP

arXiv cs.LG·May 22

TBP-mHC: full expressivity for manifold-constrained hyper connections through transportation polytopes

TBP-mHC proposes Birkhoff polytope parameterizations for manifold-constrained Hyper-Connections. The method constructs exactly doubly stochastic mixing matrices with (n-1)² degrees of freedom, avoiding iterative normalization and combinatorial explosion. Competitive results on language model pre-training with improved stability and scalability.

Papers Reasoning

SIG

HYP

Reddit r/MachineLearning·May 21

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

RPS is a two-stage post-training method inspired by neuroplasticity: easy data with high learning rate, then hard data with 90% reduced rate. On Qwen3-8b, RPS achieves 4% on ARC-AGI 1 and 1145/1200 error-free program executions versus 2.4% and 870/1200 for EPS (equal rate).

Qwen Fine-tuning Code generation

SIG

HYP

arXiv cs.CL·May 21

Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues

Novel Pseudo-Siamese architecture (FF-BPSN) for planning dialogue paths toward predefined targets. Uses two bidirectional transformer decoders with forward-focused module. Tested on DuRecDial and DuRecDial 2.0, significantly improves target-oriented proactive dialogue systems.

AI Agents Reasoning Benchmarks

SIG

HYP

arXiv cs.CL·May 21

Leveraging Large Language Models for Sentiment Analysis: Multi-Modal Analysis of Decentraland's MANA Token

Study using BERT to analyze Decentraland Discord community sentiment and forecast MANA token price. Multi-modal LSTM model integrating sentiment, trading volume, and market cap significantly outperforms price-only baseline. Community sentiment predominantly neutral with positive skew.

Papers Benchmarks

SIG

HYP

arXiv cs.LG·May 20

Adaptive Multi-Scale Goodness Aggregation for Forward-Forward Learning

AMSGA extends the Forward-Forward algorithm with multi-scale goodness aggregation, adaptive curriculum, and layer-dependent thresholds. Tests on MNIST and Fashion-MNIST show +1.45% and +1.50% improvement without significant computational overhead.

Papers Benchmarks Reasoning

SIG

HYP

arXiv cs.AI·May 19

A Machine Learning Framework for EEG-Based Prediction of Treatment Efficacy in Chronic Neck Pain

ML framework using EEG to predict treatment efficacy in chronic neck pain patients. Rigorous preprocessing pipeline (baseline removal, ICA, spectral analysis) applied to resting-state and motor EEG. Systematic review of 763 studies (16 patient, 47 healthy-control studies) to inform post-processing strategy.

Evals Papers

SIG

HYP

arXiv cs.AI·May 19

Nested Spatio-Temporal Time Series Forecasting

Nested spatio-temporal forecasting framework coupling macro-level regional trends with micro-level historical observations. Uses spectral clustering to construct semantically coherent regions, filtering systematic noise while preserving trends. Progressive coarse-to-fine predictor integrates features to anticipate dynamic anomalies. Outperforms state-of-the-art baselines on high-dimensional datasets.

Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

SAS: Semantic-aware Sampling for Generative Dataset Distillation

SAS introduces semantic-aware dataset distillation leveraging CLIP as a semantic prior to improve compressed dataset quality. Three scoring functions evaluate class relevance, inter-class separability, and intra-set diversity. A two-stage strategy filters discriminative samples then dynamically selects to reduce redundancy while preserving semantic coverage.

Embeddings Vision Benchmarks

SIG

HYP

arXiv cs.AI·May 19

UNR-Explainer: Counterfactual Explanations for Unsupervised Node Representation Learning Models

UNR-Explainer generates counterfactual explanations for unsupervised node representation learning models (GNNs). The method identifies critical subgraphs that alter k-nearest neighbors of a node in embedding space using Monte Carlo Tree Search (MCTS). Evaluated on GraphSAGE and DGI.

Papers Reasoning Evals

SIG

HYP

arXiv cs.AI·May 19

Towards Robust Argumentative Essay Understanding via TIDE: An Interactive Framework with Trial and Debate

TIDE is a prompt optimization framework using a Trial and Debate mechanism to improve argumentative essay understanding. Evaluated on three tasks (Automated Essay Scoring, Argument Component Detection, Argument Relation Identification), it mitigates noisy training data impact and enhances optimization stability.

Prompt engineering Reasoning Evals

SIG

HYP

arXiv cs.AI·May 19

GPU-Accelerated Deep Learning for Heatwave Prediction and Urban Heat Risk Assessment

GPU-accelerated deep learning framework for next-day urban thermal prediction and heatwave risk assessment. ConvLSTM with mixed loss function achieves MAE=0.2293, RMSE=0.3089, R²=0.8877 using MODIS and Open-Meteo data in Sarajevo. Generates city heat risk maps.

Benchmarks Vision Infrastructure

SIG

HYP

arXiv cs.AI·May 19

Learning Displacement-Aware WiFi Representations for Weakly Supervised Relative Localization

Relative WiFi localization without dense coordinate annotations. Intersection Pathway aligns WiFi fingerprint traces and inertial motion vectors in a shared additive latent space, enabling direct relative-displacement inference. Validated on synthesized dataset from real measurements.

Reinforcement learning Embeddings

SIG

HYP