BITS for GAPS: Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates

Explainable & Ethical AI
Published: arXiv: 2511.16815v1
Authors

Kyla D. Jones Alexander W. Dowling

Abstract

We introduce the Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates (BITS for GAPS) framework to emulate latent components in hybrid physical systems. BITS for GAPS supports serial hybrid modeling, where known physics governs part of the system and residual dynamics are represented as a latent function inferred from data. A Gaussian process prior is placed over the latent function, with hierarchical priors on its hyperparameters to encode physically meaningful structure in the predictive posterior. To guide data acquisition, we derive entropy-based acquisition functions that quantify expected information gain from candidate input locations, identifying samples most informative for training the surrogate. Specifically, we obtain a closed-form expression for the differential entropy of the predictive posterior and establish a tractable lower bound for efficient evaluation. These derivations approximate the predictive posterior as a finite, uniformly weighted mixture of Gaussian processes. We demonstrate the framework's utility by modeling activity coefficients in vapor-liquid equilibrium systems, embedding the surrogate into extended Raoult's law for distillation design. Numerical results show that entropy-guided sampling improves sample efficiency by targeting regions of high uncertainty and potential information gain. This accelerates surrogate convergence, enhances predictive accuracy in non-ideal regimes, and preserves physical consistency. Overall, BITS for GAPS provides an efficient, interpretable, and uncertainty-aware framework for hybrid modeling of complex physical systems.

Paper Summary

Problem
Complex systems in science and engineering often require modeling that combines theoretical knowledge with empirical evidence. Hybrid modeling, which integrates first-principles and data-driven elements, is a powerful paradigm for describing these systems. However, the effectiveness of hybrid modeling depends critically on the availability and quality of data, which is often limited by time, cost, and computational resources.
Key Innovation
This paper introduces the Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates (BITS for GAPS) framework, which addresses the problem of data acquisition in hybrid modeling. BITS for GAPS uses a Gaussian process prior to encode physically meaningful structure in the predictive posterior and derives entropy-based acquisition functions to guide data acquisition. This framework supports serial hybrid modeling, where known physics governs part of the system and residual dynamics are represented as a latent function inferred from data.
Practical Impact
The BITS for GAPS framework has several practical implications. Firstly, it enables efficient and principled strategies for data acquisition in hybrid modeling, which is essential for advancing the practical use of hybrid models. Secondly, it provides a flexible and interpretable framework for modeling complex physical systems, which can be applied in various fields such as chemical engineering, materials science, and aerospace engineering. Finally, it can be used to improve the accuracy and uncertainty calibration of surrogate models, which is critical for downstream design tasks such as phase envelope construction and theoretical stage count estimation.
Analogy / Intuitive Explanation
Imagine you are trying to build a complex machine, such as a car engine, using a combination of theoretical knowledge and empirical evidence. You know some of the components, such as the engine block and cylinder head, but you are not sure how they interact with each other. In this scenario, hybrid modeling is like trying to understand the behavior of the engine by combining theoretical knowledge of the individual components with data-driven evidence from experiments or simulations. BITS for GAPS is like a tool that helps you to optimize the design of the engine by identifying the most informative experimental conditions and selecting the best data to collect.
Paper Information
Categories:
stat.ML cs.LG
Published Date:

arXiv ID:

2511.16815v1

Quick Actions