Papers

Trending research and the full catalog - each paper linked to the benchmarks, methods, and models it introduces.

$λ$-PSD: Scalable Approximate SNR-Optimised Polynomial Stein Discrepancies

25 Jun 2026

Polynomial Stein discrepancies (PSD) provide a scalable alternative to kernel Stein methods for measuring sample quality and goodness-of-fit testing, but their statistical properties remain poorly understood.

Adaptive Evaluation of Out-of-Band Defenses Against Prompt Injection in LLM Agents

25 Jun 2026

Recent work (2024 to 2026) has converged on a strategy for defending tool-using LLM agents against indirect prompt injection: rather than training the model to refuse malicious instructions, enforce security outside the model with a deterministic policy that mediates the agent's…

Adversarial Diffusion Across Modalities: A Fusion Survey of Attacks, Defenses, and Evaluation for Text, Vision, and Vision-Language Models

25 Jun 2026

Adversarial evaluation of AI systems has matured along four largely disconnected tracks: diffusion-based attacks on text and large language models (LLMs), diffusion-based attacks on image classifiers, jailbreak pipelines against vision-language models, and diffusion-based input…

AgentX: Towards Agent-Driven Self-Iteration of Industrial Recommender Systems

25 Jun 2026

Recommendation algorithm iteration is moving from an artisanal, engineer-bound process toward an industrialized research loop, but this transition remains blocked by a structural execution bottleneck: the idea-to-launch cycle still depends on human engineers to generate…

AIGP: An LLM-Based Framework for Long-Term Value Alignment in E-Commerce Pricing

25 Jun 2026

Traditional dynamic pricing models in large-scale e-commerce suffer from limited interpretability, poor utilization of unstructured information, and misalignment with long-term business objectives such as cumulative Gross Merchandise Value (GMV), Return on Investment (ROI) and…

All you need is log

25 Jun 2026

Comparing two probability distributions is a basic building block of statistics and machine learning, and the right family is well understood: the Rényi divergences of order $α\in[0,\infty]$ are the unique family monotone under data processing and additive on independent…

Ask, Don't Judge: Binary Questions for Interpretable LLM Evaluation and Self-Improvement

25 Jun 2026

Evaluating LLM outputs remains a major bottleneck in NLP: human evaluation is expensive and slow, lexical metrics correlate poorly with human judgments on open-ended generation, and holistic LLM judges often produce opaque scores that are hard to debug.

Assessing Post-Reform Changes in Risk Disclosure Quality with a Multidimensional Text Analysis Approach

25 Jun 2026

While corporate narrative disclosures provide crucial information to capital markets, comprehensively evaluating their qualitative changes over time remains challenging.

Asymptotically Optimal Learning for Parametric Prophet Inequalities

25 Jun 2026

We study learning in prophet inequalities with i.i.d. rewards drawn from an exponential-type parametric family with an unknown parameter $θ$, a class that includes exponential, Pareto, and bounded-support power-family distributions.

Auditing Framing-Sensitive Behavioral Instability in Large Language Models for Mental Health Interactions

25 Jun 2026

Large language models (LLMs) are increasingly being integrated into mental health support tools and other psychologically sensitive conversational applications. In such settings, behavioral stability and consistency are important for trustworthy human-AI interaction.

Beyond Global Divergences: A Local-Mass Perspective on Bayesian Inference

25 Jun 2026

Global objectives, such as KL divergence and ELBO, are widely used in Bayesian inference for measuring distributional discrepancy. This paper studies their local-mass behaviour that is not directly captured by such objectives.

$\text{DT}^2$: Decision-Targeted Digital Twins

24 Jun 2026

A digital twin (DT) is a virtual model of a real-world system that can assist decision-making by simulating scenarios induced by different policies. However, typical machine learning-based DTs do not optimise for this use case.

A 3D-Printable Dataset for Fair Testing and Comparisons of Tactile Sensors

24 Jun 2026

Existing texture datasets for tactile sensing primarily consist of sensor readings from a specific sensor interacting with available surfaces/objects rather than describing the textures themselves, limiting fair comparison between tactile sensors and hindering reproducible…

A cross-process welding penetration status prediction algorithm based on unsupervised domain adaptation in laser and TIG welding

24 Jun 2026

Supervised deep learning has been widely used for weld penetration state classification; however, its performance often degrades significantly under domain shift, such as when transferring models between welding processes with distinct physical mechanisms:for instance, from…

A functional central limit theorem for kernel gradient flow and infinitesimal gradient boosting

24 Jun 2026

Building on the large-sample analysis of infinitesimal gradient boosting (Dombry and Duchamps, 2024b), we study the fluctuations of the process around its deterministic limit and establish a functional central limit theorem: the rescaled deviations converge in distribution to a…

A probabilistic framework for online test-time adaptation

24 Jun 2026

This paper presents a probabilistic framework for online test-time adaptation problems. In them, a model is trained on labeled data but must adapt to unlabeled data at test time under the assumption that training and test distributions potentially differ, that is, there might…

A Red Teaming Framework for Large Language Models: A Case Study on Faithfulness Evaluation

24 Jun 2026

Large language models (LLMs) have demonstrated remarkable performance across natural language processing tasks, yet their deployment in high-stakes applications raises critical concerns regarding reliability, safety, and trustworthiness.

A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models

24 Jun 2026

Large language models (LLMs) are increasingly deployed across languages, but their safety behavior remains uneven across linguistic and cultural contexts. This survey synthesizes work on toxicity detection and detoxification for multilingual LLMs.

A welding penetration prediction model for laser welding process based on self-supervised learning using physics-informed neural networks

24 Jun 2026

The laser welding full-penetration is of critical importance, as it constitutes one of the fundamental factors in achieving defect-free welded joints. Accurate prediction of the penetration state is therefore essential for ensuring weld quality.

Adaptive Oscillatory Inductive Bias for Modeling Sharp Prosodic Dynamics in Diffusion-Based TTS

24 Jun 2026

Diffusion-based text-to-speech (TTS) models have achieved significant improvements in speech quality. However, modeling sharp prosodic transitions and rapid pitch variations in expressive speech remains challenging.

Agentic evolution of physically constrained foundation models

24 Jun 2026

Artificial intelligence increasingly drives automated scientific discovery, yet contemporary generalist agents lack physical grounding, frequently hallucinating hardware-incompatible designs.

Agentic Knowledge Tracing: A Multi-Agent LLM Architecture for Stealth Assessment of Financial Literacy in Serious Games

24 Jun 2026

Assessing financial literacy during gameplay without disrupting the learning experience remains a key challenge in serious games for education. We present the Agentic BKT pipeline, a multi-agent large language model architecture for stealth assessment of financial competencies…

Agentic System as Compressor: Quantifying System Intelligence in Bits

24 Jun 2026

Large language models are turning from isolated predictors into agentic systems: they call tools, retrieve evidence, obey environment constraints, use verifiers, and complete tasks through search and multi-turn interaction.

AI Coaching for Accelerating Human Skill Development with Reinforcement Learning

24 Jun 2026

AI copilots can substantially boost human performance through shared control, but excessive assistance can induce over-reliance and skill atrophy. This paper studies how an embodied AI agent can act as a coach that accelerates human motor-skill development.