TPDP 2024 will take place on August 20 and 21 at the Harvard University Science and Engineering Complex (SEC). TPDP is co-located with the 2024 OpenDP Community Meeting, happening on August 22 and 23. We hope you will attend both!
Registration: Registration for TPDP 2024 is free. Please click here or use the link in the sidebar to register for the workshop, and don't forget to register for the OpenDP Community Meeting (also free) too!
Logistics: The workshop will be held at the Harvard University Science and Engineering Complex (SEC). The OpenDP Community Meeting page has a helpful list of nearby hotels. To park in the nearby Harvard University lots, you may purchase and print a parking pass.
9:00-9:05 |
Welcome
|
|
9:05-9:50 |
Local Differential Privacy (LDP) has become increasingly popular in recent years. However, with its widespread adoption, it is crucial to examine the potential vulnerabilities. The distributed nature of LDP exposes it to poisoning attacks, where an adversary can realistically inject fake clients that submit poisoned or malformed data. In this talk, we will explore solutions to provide provable robustness against such attacks. Specifically, we will analyze how LDP protocols possess a unique characteristic that distinguishes them from non-private ones — the clear separation between the input and the final response (obtained after randomization). This separation provides adversaries with two distinct opportunities to tamper with the data. We will discuss strategies to mitigate both types of tampering by applying them in real-world settings and exploring the associated challenges. |
|
9:50-10:35 |
We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling-based DP-SGD is more commonly used in practical implementations, it has not been amenable to easy privacy analysis, either analytically or even numerically. On the other hand, Poisson subsampling-based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling-based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD. We study differential privacy under continual observation and prove new lower bounds for private online counters. Specifically, we show that, over a stream of length T, any (eps, delta)-DP counter must incur an additive error of Omega(log T), even if the stream contains at most log(T) increments. For the offline version of the problem (i.e., releasing prefix sums of a given array), there is an upper bound for sparse stream under approximate DP with O(log^* T) additive error. Our result thus separates the online and offline versions of the problem. Based on the new proof, we derive several consequences for other online tasks, including a new lower bound for answering online threshold queries, and a lower bound for differentially-private online learning. In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differentially private continual counting are either inefficient in terms of their space usage or add an excessive amount of noise, inducing suboptimal utility. The most practical DP continual counting algorithms add carefully correlated Gaussian noise to the values. The task of choosing the covariance for this noise can be expressed in terms of factoring the lower-triangular matrix of ones (which computes prefix sums). We present two approaches from this class (for different parameter regimes) that achieve near-optimal utility for DP continual counting and only require logarithmic or polylogarithmic space (and time). |
|
10:35-11:00 |
Break Thanks to Silver level sponsors dpella and Tumult Labs! |
|
11:00-12:30 |
Poster Session #1
|
|
12:30-1:30 |
Lunch (provided) Thanks to Platinum level sponsors Apple and Google! |
|
1:30-2:45 |
Panel Discussion: practical concerns and open challenges in real-world deployments of differential privacy Panelists: Michael Hay (Tumult Labs), Shlomi Hod (Boston University), James Honaker (Mozilla Anonym), Andreas Terzis (Google Research) |
|
2:45-3:15 |
Break Thanks to Silver level sponsors dpella and Tumult Labs! |
|
3:15-4:45 |
Poster Session #2
|
9:00-9:45 |
This talk will cover three research directions pertinent to the DP community and relevant in modern machine learning. First, I will discuss training data attribution and high level connections to DP and membership inference. Next, I will cover data curation and potential intersections with differential privacy. Finally, I will discuss recent work on alternative privacy semantics tailored to modern use cases. In covering these areas, the primary focus will be on introducing the general directions to encourage interest and figure work. To this end, I will discuss my own work as well as others' when appropriate. Notes for the talk can be found here. |
|
9:45-10:30 |
Differential privacy and sublinear algorithms are both rapidly emerging algorithmic themes in times of big data analysis. Although recent works have shown the existence of differentially private sublinear algorithms for many problems including graph parameter estimation and clustering, little is known regarding hardness results on these algorithms. In this paper, we initiate the study of lower bounds for problems that aim for both differentially-private and sublinear-time algorithms. Our main result is the incompatibility of both the desiderata in the general case. In particular, we prove that a simple problem based on one-way marginals yields both a differentially-private algorithm, as well as a sublinear-time algorithm, but does not admit a "strictly" sublinear-time algorithm that is also differentially private. In this work, we study density estimation in the Wasserstein distance (an appropriate metric in many practical settings like estimating population density). We move beyond (pessimistic) worst-case bounds for this problem and instead approach it from an 'instance optimality' lens- asking that algorithms adapt to easy instances. We define strong notions of instance optimality, characterize the corresponding instance optimal rates, and show that variants of practical algorithms achieve these rates (up to polylogarithmic factors). We study the problem of hypothesis selection under the constraint of local differential privacy (ε-LDP). Our main contribution is the first sample-optimal algorithm for this problem in the high privacy regime. Namely, the dependence of the proposed algorithm on the number of hypotheses is linear. This algorithm uses loglog k rounds of (sequential) interaction for choosing from k hypotheses. Our result demonstrates the power of interaction in LDP hypothesis selection as any non-interactive algorithm requires at least Ω(klogk) samples. To prove our results, we define the notion of critical queries for a Statistical Query Algorithm (SQA) which may be of independent interest. Informally, an SQA is said to use a small number of critical queries if its success relies on the accuracy of only a small number of queries it asks. We then design an LDP algorithm that uses a smaller number of critical queries. |
|
10:30-11:00 |
Break Thanks to Silver level sponsors dpella and Tumult Labs! |
|
11:00-12:30 |
Poster Session #3
|
|
12:30-1:30 |
Lunch (provided) Thanks to Platinum level sponsors Apple and Google! |
|
1:30-2:15 |
While the track to effective deployment of differential privacy (DP) may depend on quality research, there are a number of non-research factors that can derail would-be implementers. This keynote will cover current challenges organizations are facing in their efforts to deploy DP, ranging from knowledge gaps, people problems, risk calculations, and more. |
|
2:15-3:00 |
Mechanisms for generating differentially private synthetic data based on marginals and graphical models have been successful in a wide range of settings. However, one limitation of these methods is their inability to incorporate public data. Initializing a data generating model by pre-training on public data has shown to improve the quality of synthetic data, but this technique is not applicable when model structure is not determined a priori. We develop the mechanism JAM-PGM, which expands the adaptive measurements framework to jointly select between measuring public data and private data. This technique allows for public data to be included in a graphical-model-based mechanism. We show that jam-pgm is able to outperform both publicly assisted and non publicly assisted synthetic data generation mechanisms even when the public data distribution is biased. DP-SGD is the workhorse algorithm for private deep learning, but has proven difficult to scale to the era of foundation models. We introduce DP-ZO, a framework for privatizing zeroth order optimization methods. Unlike DP-SGD methods, such as DP-LoRA, DP-ZO does not require implementing the cumbersome per-sample gradient clipping operation because the update information from each sample is a scalar. DP-ZO methods are competitive in utility with DP-SGD while using far less memory, and we contend that DP-ZO methods may be superior to DP-SGD for finetuning models at the foundation scale. We outline a number of future directions for improving the analysis and implementation of DP-ZO methods. When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms. In addition to the generic framework, we provide explicit overall privacy guarantees for multiple data-dependent pre-processing algorithms, such as scaling, data imputation, quantization, deduplication and PCA, when used in combination with several DP algorithms. Notably, this framework is also simple to implement, allowing direct integration into existing DP pipelines. |
|
3:00-6:00 |
TPDP/OpenDP Community Meeting Reception Refreshments provided. Thanks to Diamond level sponsors Capital One, Microsoft, and Oblivious! |
Poster Session 1
Poster Session 2
Poster Session 3
Virtual presentations
Differential privacy (DP) is the leading framework for data analysis with rigorous privacy guarantees. In the last 18 years, it has transitioned from the realm of pure theory to large scale, real world deployments.
Differential privacy is an inherently interdisciplinary field, drawing researchers from a variety of academic communities including machine learning, statistics, security, theoretical computer science, databases, and law. The combined effort across a broad spectrum of computer science is essential for differential privacy to realize its full potential. To this end, this workshop aims to stimulate discussion among participants about both the state-of-the-art in differential privacy and the future challenges that must be addressed to make differential privacy more practical.
Specific topics of interest for the workshop include (but are not limited to):
Submissions: Authors are invited to submit a short abstract of new work or work published since July 2023 (the most recent TPDP submission deadline). Submissions must be 4 pages maximum, not including references. Submissions may also include appendices, but these are only read at reviewer's discretion. There is no prescribed style file, but authors should ensure a minimum of 1-inch margins and 10pt font. Submissions are not anonymized, and should include author names and affiliations.
Submissions will undergo a lightweight review process and will be judged on originality, relevance, interest, and clarity. Based on the volume of submissions to TPDP 2023 and the workshop's capacity constraints, we expect that the review process will be somewhat more competitive than in years past. Accepted abstracts will be presented at the workshop either as a talk or a poster.
The workshop will not have formal proceedings and is not intended to preclude later publication at another venue. In-person attendance is encouraged, though authors of accepted abstracts who cannot attend in person will be invited to submit a short video to be linked on the TPDP website.
Selected papers from the workshop will be invited to submit a full version of their work for publication in a special issue of the Journal of Privacy and Confidentiality.
We are very grateful to our sponsors whose generosity has been critical to the continued success of the workshop. For information about sponsorship opportunities, please contact us at tpdp.chairs@gmail.com.
For concerns regarding submissions, please contact tpdp.chairs@gmail.com