Query-Efficient Locally Private Hypothesis Selection via the Scheffe Graph

Explainable & Ethical AI
Published: arXiv: 2509.16180v1
Authors

Gautam Kamath Alireza F. Pour Matthew Regehr David P. Woodruff

Abstract

We propose an algorithm with improved query-complexity for the problem of hypothesis selection under local differential privacy constraints. Given a set of $k$ probability distributions $Q$, we describe an algorithm that satisfies local differential privacy, performs $\tilde{O}(k^{3/2})$ non-adaptive queries to individuals who each have samples from a probability distribution $p$, and outputs a probability distribution from the set $Q$ which is nearly the closest to $p$. Previous algorithms required either $\Omega(k^2)$ queries or many rounds of interactive queries. Technically, we introduce a new object we dub the Scheff\'e graph, which captures structure of the differences between distributions in $Q$, and may be of more broad interest for hypothesis selection tasks.

Paper Summary

Problem
The main problem this paper addresses is hypothesis selection under local differential privacy constraints. This means that we want to find the distribution in a set of possible distributions that is closest to the actual distribution, while also ensuring that our method doesn't leak too much information about individual data points.
Key Innovation
The key innovation of this paper is the introduction of a new object called the Scheffé graph, which captures the structure of the differences between distributions in the set Q. This allows the authors to develop a new algorithm that performs fewer queries to individuals who have samples from a probability distribution p, while still ensuring local differential privacy.
Practical Impact
This research has important practical implications for data analysis and machine learning. By developing an algorithm that can select the best hypothesis under local differential privacy constraints, we can ensure that our methods are more private and secure. This is particularly important in applications where data is sensitive or regulated, such as in healthcare or finance.
Analogy / Intuitive Explanation
Think of hypothesis selection like trying to find the best recipe for a cake. You have a set of possible ingredients (distributions) and you want to find the one that produces the closest match to the actual cake (the true distribution). The Scheffé graph is like a map of the relationships between the ingredients, showing which ones are similar or different. By using this map, we can navigate the space of possible distributions and find the best one, while also ensuring that we don't reveal too much about the individual ingredients (data points).
Paper Information
Categories:
cs.DS cs.LG stat.ML
Published Date:

arXiv ID:

2509.16180v1

Quick Actions