A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives

Generative AI & LLMs
Published: arXiv: 2509.10393v1
Authors

Clémentine Chazal Heishiro Kanagawa Zheyang Shen Anna Korba Chris. J. Oates

Abstract

Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised. This increased flexibility introduces a computational challenge, as one loses access to an explicit unnormalised density for the target. To mitigate this difficulty, we introduce a novel measure of suboptimality called 'gradient discrepancy', and in particular a 'kernel gradient discrepancy' (KGD) that can be explicitly computed. In the standard Bayesian context, KGD coincides with the kernel Stein discrepancy (KSD), and we obtain a novel charasterisation of KSD as measuring the size of a variational gradient. Outside this familiar setting, KGD enables novel sampling algorithms to be developed and compared, even when unnormalised densities cannot be obtained. To illustrate this point several novel algorithms are proposed, including a natural generalisation of Stein variational gradient descent, with applications to mean-field neural networks and prediction-centric uncertainty quantification presented. On the theoretical side, our principal contribution is to establish sufficient conditions for desirable properties of KGD, such as continuity and convergence control.

Paper Summary

Problem
The main problem addressed in this research paper is the challenge of computing a computable measure of suboptimality for entropy-regularised variational objectives. This is a crucial issue in post-Bayesian methods, where the increased flexibility of the objectives makes it difficult to access an explicit unnormalised density for the target distribution.
Key Innovation
The key innovation of this work is the introduction of a novel measure of suboptimality called gradient discrepancy (GD), and a kernel gradient discrepancy (KGD) that can be explicitly computed. The KGD is a generalisation of the kernel Stein discrepancy (KSD) in the standard Bayesian context, and it enables the development of novel sampling algorithms even when unnormalised densities cannot be obtained.
Practical Impact
The practical impact of this research is significant, as it provides a computable measure of suboptimality for entropy-regularised variational objectives. This can be applied in various fields, such as machine learning, statistics, and post-Bayesian methods, where the increased flexibility of the objectives is a major challenge. The KGD can be used to develop novel sampling algorithms, compare different algorithms, and establish sufficient conditions for desirable properties of the KGD.
Analogy / Intuitive Explanation
Imagine trying to navigate through a complex landscape without a map. The KGD is like a GPS system that helps you find the shortest path to the target distribution, even when the landscape is changing and you don't have an explicit map. The KGD provides a way to measure the "distance" between the current distribution and the target distribution, which is essential for developing efficient sampling algorithms and comparing different methods.
Paper Information
Categories:
stat.CO stat.ML
Published Date:

arXiv ID:

2509.10393v1

Quick Actions