Workshop Information

TPDP 2026 will take place on June 1 and 2 at the Northeastern University Curry Student Center (CSC). TPDP is co-located with the Differential Privacy for Health and Genomics workshop (June 2-3) and the Foundations of Responsible Computing conference (June 3-5). We hope you will attend!

Registration: Please fill out this registration form by May 16.

Invitation Letter: If you require an invitation letter for your visa application, please fill out this request form.

Note: The 2026 FIFA World Cup will be partially hosted in Boston, with matches beginning on June 13. We encourage attendees to book accommodations well in advance.

Program (Tentative)

Monday, June 1

8:30-9:00 Breakfast
9:00-9:05 Welcome note from chairs!
9:05-9:50 Keynote #1

For several years, on-device federated learning (FL) was the most common approach for training machine learning (ML) models on private, distributed user data. Despite this, on-device training has several drawbacks: (1) foundation models have become far too large to train on client devices, (2) on-device training is communication- and computation-intensive, and (3) on-device training can be difficult to debug and deploy. To address these problems, we study a pipeline in which models are trained at a central server on differentially-private synthetic data from client devices. We show how a recent algorithm called Private Evolution can outperform traditional federated learning baselines in utility and cost. We provide early theoretical analysis of its properties, including distributional convergence guarantees. Finally, we show how the Private Evolution algorithm can be reformulated as a preference optimization problem, thereby significantly improving the performance of private synthetic data relative to on-device baselines and prior synthetic data baselines.

Giulia Fanti's website


9:50-10:45 Contributed Talks: The Theory of Everything
Channeling a bit of Stephen Hawking, this session probes the deepest laws of the differential privacy cosmos--where epsilon is small, lower bounds are inevitable and proofs curve spacetime.

Private Evolution (PE) is a differentially private algorithm for synthetic data generation. While it can be viewed as a Wasserstein learning algorithm, it performs much better in practice than worst‑case Wasserstein analyses would predict. We recast PE as a generative model‑augmented Wasserstein learning. We show theoretically that when we take into account the use of a generative model that is able to capture something about the true distribution, then we can obtain much better performance bounds. For example, if the generator gives samples in the same low-dimensional space as the distribution, then sample complexity depends on intrinsic, not ambient, dimension. We also show that standard variants of PE can fail to converge on simple well-clustered instances, and propose a new geometry-aware version of PE with provable convergence on such instances. Experimentally, we show that our new algorithm has consistent empirical gains over standard baselines.

Bilevel optimization, in which one optimization problem is nested inside another, underlies many machine learning applications with a hierarchical structure -- such as meta-learning and hyperparameter optimization. Such applications often involve sensitive training data, raising pressing concerns about individual privacy. Motivated by this, we study differentially private bilevel optimization. We first focus on settings where the outer-level objective is convex, and provide novel upper and lower bounds on the excess empirical risk for both pure and approximate differential privacy. These bounds are nearly tight and essentially match the optimal rates for standard single-level differentially private ERM, up to additional terms that capture the intrinsic complexity of the nested bilevel structure. We also provide population loss bounds for bilevel stochastic optimization. The bounds are achieved in polynomial time via efficient implementations of the exponential and regularized exponential mechanisms. A key technical contribution is a new method and analysis of log-concave sampling under inexact function evaluations, which may be of independent interest. In the non-convex setting, we develop novel algorithms with state-of-the-art rates for privately finding approximate stationary points. Notably, our bounds do not depend on the dimension of the inner problem.

We study the computational cost of differential privacy in terms of memory efficiency. While the trade-off between accuracy and differential privacy is well-understood, the inherent cost of privacy regarding memory use remains largely unexplored. This paper establishes for the first time an unconditional space lower bound for user-level differential privacy by introducing a novel proof technique based on a multi-player communication game.

We study differentially private continual release of the number of distinct items in a turnstile stream, where items may be both inserted and deleted. We show that existing polynomial lower bounds on the additive error required for privacy can be circumvented when some multiplicative error is also allowed. We give algorithms for continual estimation of the number of distinct elements or F2 moment of the stream with polylogarithmic additive error at the cost of a small multiplicative error.


10:45-11:00 Break

11:00-12:00 Poster Session A (2nd Floor Suites)
12:00-1:30 Lunch (on your own)
1:30-3:00 Contributed Talks: A.I Artificial Intelligence
As exciting as the Steven Spielberg blockbuster, this session brings together AI and differential privacy--no robot children were harmed during the making of these papers.

The widespread adoption of AI assistants has prompted the development of privacy-aware platforms designed to extract insights from real-world usage. Their privacy protections primarily rely on layering multiple heuristic techniques, such as PII redaction, clustering, aggregation, and LLM-based privacy auditing. In this paper, we put their privacy claims to the test by presenting CLIOPATRA, the first attack against "privacy-preserving" LLM-based insights systems.

Differential privacy (DP) has a wide range of applications for protecting data privacy, but designing and verifying DP algorithms requires expert-level reasoning, creating a high barrier for non-expert practitioners. Prior works either rely on specialized verification languages that demand substantial domain expertise or remain semi-automated and require human-in-the-loop guidance. In this work, we investigate whether large language models (LLMs) can automate DP reasoning. We introduce DPrivBench, a benchmark in which each instance asks whether a function or algorithm satisfies a stated DP guarantee under specified assumptions. The benchmark is carefully designed to cover a broad range of DP topics, span diverse difficulty levels, and resist shortcut reasoning through trivial pattern matching. Experiments show that while the strongest models handle textbook mechanisms well, all models struggle with advanced algorithms, revealing substantial gaps in current DP reasoning capabilities. Our benchmark provides a solid foundation for developing and evaluating such methods, and complements existing benchmarks for mathematical reasoning.

Differentially private (DP) text synthesis promises to unlock sensitive corpora for model training, but it remains unclear whether DP synthetic data transmits genuinely new knowledge and capabilities present only in those corpora. This is because existing evaluations rely on tasks that are nearly solvable without training, so strong benchmark performance does not establish that DP synthesis can substitute original data access. Thus, we introduce ContinuousBench, a continuously and automatically-regenerated benchmark that measures capability gain from DP synthetic text. Each quarter, a new release pairs a never-before-seen training corpus with a derived QA set, constructed to be: (1) unsolvable sans-corpus; and (2) learnable under DP, as the tested knowledge is supported by hundreds of independent records. Researchers produce DP synthetic data from the training corpus and run our standardized training and evaluation harness on their synthetic data to measure gains. We instantiate two tracks: Geminon, a procedurally-generated dataset about fictional creatures; and News, a stream of newly scraped public news articles. Although standard benchmarks are nearly saturated, on ContinuousBench we find that non-private synthesis transfers substantial knowledge from the original corpus, while state-of-the-art DP synthesis methods generally fail to do so, even at ε = 100. ContinuousBench is available at https://huggingface.co/ContinuousBench

Are there any conditions under which a generative model's outputs are guaranteed not to infringe the copyrights of its training data? This is the question of "provable copyright protection" first posed by Vyas, Kakade, and Barak (ICML 2023). They define near access-freeness (NAF) and propose it as sufficient for protection. This paper revisits the question and establishes new foundations for provable copyright protection -- foundations that are firmer both technically and legally. First, we show that NAF alone does not prevent infringement. In fact, NAF models can enable verbatim copying, a blatant failure of copyright protection that we dub being tainted. Then, we introduce our blameless copyright protection framework for defining meaningful guarantees, and instantiate it with clean-room copyright protection. Clean-room copyright protection allows a user to control their risk of copying by behaving in a way that is unlikely to copy in a counterfactual "clean-room setting." Finally, we formalize a common intuition about differential privacy and copyright by proving that DP implies clean-room copyright protection when the dataset is golden, a copyright deduplication requirement.

We study coalition formation for data sharing under differential privacy when agents have heterogeneous privacy preferences. We study a fully decentralized data sharing mechanism where each agent holds a sensitive data point and decides whether to participate in a data-sharing coalition and how much noise to add to their data. Privacy choices induce a fundamental trade-off: higher privacy reduces individual data-sharing costs but degrades data utility and statistical accuracy for the coalition. These choices generate externalities across agents, making both participation and privacy levels strategic. Our goal is to understand which coalitions are stable, how privacy choices shape equilibrium outcomes, and how fully decentralized data sharing compares to a centralized, socially optimal benchmark when the number of players is large. We provide a comprehensive analysis across a range of privacy-cost regimes, from decreasing costs (privacy amplification from pooling data) to increasing costs (greater exposure to privacy attacks in larger coalitions), characterizing: i) which regimes offer non-trivial improvements in accuracy and social cost; and ii) the efficiency gap between the centralized and decentralized mechanisms. The main insight is that full decentralization is often highly inefficient, primarily due to players being risk-averse and selfishly choosing highly stringent privacy levels for themselves.


3:00-4:00 Poster Session B (2nd Floor Suites)
4:00-4:20 Break
4:20-5:30 Contributed Talks: The Practice
This session shares more than just the Boston zipcode with the legal drama--we promise the same thrill as a courtroom showdown as differential privacy becomes the star witness in practical deployment.

While differential privacy provides strong mathematical guarantees, practical implementations often suffer from subtle bugs that invalidate these theoretical protections. To address this, we introduce a novel auditing framework, Re:cord-play, that inspects the internal states of DP algorithms to overcome the limitations of standard black-box testing. By isolating the privacy mechanisms, our approach can deterministically catch data leaks into data-independent logic or flag sensitivity miscalculations. In this presentation, we will detail the framework's methodology and showcase real-world vulnerabilities discovered through audits of popular open-source DP libraries, revealing actionable privacy violations.

JAX-Privacy is a library designed to simplify the deployment of robust and performant mechanisms for differentially private machine learning. Guided by design principles of usability, flexibility, and efficiency, JAX-Privacy serves both researchers requiring deep customization and practitioners who want a more out-of-the-box experience. The library provides verified, modular primitives for critical components for all aspects of the mechanism design including batch selection, gradient clipping, noise addition, accounting, and auditing, and brings together a large body of recent research on differentially private ML.

Differential Privacy (DP) bounds the privacy leakage of a mechanism against worst-case membership inference, but the precise tradeoff between complex adversarial models and DP protections remains poorly understood. In this paper, we present a unified framework that generalizes the patchwork of existing bounds across membership inference, attribute inference, and data reconstruction attacks.

There is a need in the community of privacy practitioners for a trustworthy, collaborative shared database of differentially private deployments, to help foster norms about best practices. We propose a set of guidelines aimed at groups seeking to develop such a database. Such a governance resource to (1) help industry grow to consensus on best practices, (2) provide public snapshots of the privacy landscape so that regulators can judge new deployments in context and shape guidance accordingly, and (3) incentivize industry to make their choices public. We describe an initial schema to systematize this information and an editorial and governance process to ensure this information is reliable, and demonstrate a prototype interface.

The 2020 United States Census adopted differential privacy to protect individual confidentiality, adding calibrated noise to billions of demographic measurements spanning six geographic levels from nation to census block. The deployed post-processing method, TopDown, uses a series of heuristic optimizations to reconcile these noisy measurements into self-consistent population tables. We introduce BlueDown, a new post-processing algorithm that improves accuracy while achieving the same privacy guarantee. BlueDown is derived by constraining the best linear unbiased estimator (BLUE), which is efficiently computed across all geographic levels by exploiting the symmetries and block-hierarchical structure of the measurement queries. On 2020 Census data, BlueDown reduces estimation error by 8–45% for queries at the county and tract levels while satisfying all structural constraints. These gains come at no cost to confidentiality and could directly improve the downstream analyses used to guide the distribution of over $1.5 trillion in annual spending across hundreds of federal programs that rely on census data.

5:30-7:00 Job Market Session and Social Hour

Tuesday, June 2

8:30-9:00 Breakfast
9:00-9:45 Keynote #2

Algorithmic predictions are increasingly used to target benefits to individuals, particularly in low- and middle-income countries where traditional data sources are limited. In these settings, practitioners often utilize non-traditional data -- such as mobile phone metadata and other remotely sensed signals -- across a range of welfare-enhancing programs, including emergency response, anti-poverty targeting, and expanding financial inclusion. In this talk, we revisit a real-world anti-poverty program that used machine learning models trained on mobile phone metadata to allocate aid to over 800,000 individuals, and examine it through the lens of differential privacy. We show that the operational and policy constraints of this setting motivate two application-specific adaptations of standard differential privacy definitions to enable accurate targeting while providing formal privacy guarantees. We then characterize the resulting privacy-program effectiveness tradeoffs, highlighting how design choices shape both statistical performance and welfare outcomes. We conclude with concrete recommendations for structuring targeting programs to support privacy-preserving deployment in practice.

Nitin Kohli's website


9:45-10:00 Break
10:00-11:00 Panel: 20 Years of DP!
Panelists: Cynthia Dwork (Harvard), Salil Vadhan (Harvard), Sofya Raskhodnikova (Boston University), Adam Smith (Boston University), Jonathan Ullman (Northeastern University)

11:00-12:00 Poster Session C
12:00-1:00 Lunch (on your own)
1:00-1:30 Level Setting: DP for Health (hosted by OpenDP)
1:30-2:30 OpenDP Talk Session
2:30-3:00 Break
3:00-4:00 Panel Discussion: DP for Healthcare (OpenDP)
4:00-4:30 Break
4:30-5:25 DP for Health Talks: Grey’s Anatomy
This session is as binge-worthy as the medical drama--but fortunately, no code blues, only rigorous privacy guarantees.

Sharing health and behavioral data raises significant privacy concerns, as conventional de-identification methods are susceptible to privacy attacks. Differential Privacy (DP) provides formal guarantees against re-identification risks, but practical implementation necessitates balancing privacy protection and the utility of data. We demonstrate the use of DP to protect individuals in a real behavioral health study, while making the data publicly available and retaining high utility for downstream users of the data. We use the Adaptive Iterative Mechanism (AIM) to generate DP synthetic data for Phase 1 of the Lived Experiences Measured Using Rings Study (LEMURS). The LEMURS dataset comprises physiological measurements from wearable devices (Oura rings) and self-reported survey data from firstyear college students. We evaluate the synthetic datasets across a range of privacy budgets, ε = 1 to 100, focusing on the trade-off between privacy and utility.

In statistical applications it has become increasingly common to encounter data structures that live on non-linear spaces such as manifolds. Classical linear regression, one of the most fundamental methodologies of statistical learning, captures the relationship between an independent variable and a response variable which both are assumed to live in Euclidean space. Thus, geodesic regression emerged as an extension where the response variable lives on a Riemannian manifold. The parameters of geodesic regression, as with linear regression, capture the relationship of sensitive data and hence one should consider the privacy protection practices of said parameters. We consider releasing Differentially Private (DP) parameters of geodesic regression via the K-Norm Gradient (KNG) mechanism for Riemannian manifolds. We derive theoretical bounds for the sensitivity of the parameters showing they are tied to their respective Jacobi fields and hence the curvature of the space. This corroborates, and extends, recent findings of differential privacy for the Fr\'echet mean. We demonstrate the efficacy of our methodology on the sphere, $S_2\subset\mbR^3$, the space of symmetric positive definite matrices, and Kendall's planar shape space. Our methodology is general to any Riemannian manifold, and thus it is suitable for data in domains such as medical imaging and computer vision.

Research on differentially private synthetic tabular data has largely focused on independent and identically distributed rows where each record corresponds to a unique individual. This perspective neglects the temporal complexity in longitudinal datasets, such as electronic health records, where a user contributes an entire (sub) table of sequential events. While practitioners might attempt to model such data by flattening user histories into high-dimensional vectors for use with standard marginal-based mechanisms, we demonstrate that this strategy is insufficient. Flattening fails to preserve temporal coherence even when it maintains valid marginal distributions. We introduce PATH, a novel generative framework that treats the full table as the unit of synthesis and leverages the autoregressive capabilities of privately fine-tuned large language models. Extensive evaluations show that PATH effectively captures long-range dependencies that traditional methods miss. Empirically, our method reduces the distributional distance to real trajectories by over 60% and reduces state transition errors by nearly 50% compared to leading marginal mechanisms while achieving similar marginal fidelity.

De-identification is still the standard approach to privacy in biomedical data sharing, despite well-known demonstrations of its vulnerabilities. Differential privacy is rarely adopted in practice, in part because its standard (ε, δ) parameterization does not map directly to the concrete inference risks discussed by data-protection guidelines, and when these parameters are mapped to risks, we need high values of ε to achieve reasonable utility. This paper presents a unifying threat modeling framework grounded in f-DP for bounding membership inference, re-identification, attribute inference, and data reconstruction risks using a single trade-off curve, and enables direct calibration of differentially private mechanisms to a target level of inference risk as in the standard guidelines such as from ISO and European Medicines Agency. We demonstrate the framework on clinical language modeling tasks and show that we can preserve reasonable utility.


5:25-5:30 Thank you note from chairs!
5:30-7:00 Reception

Accepted Papers

Poster Session A

Poster Session B

Poster Session C

Virtual Poster Presentations

Call for Papers

Differential privacy (DP) is the leading framework for data analysis with rigorous privacy guarantees. In the last two decades, it has transitioned from the realm of pure theory to large scale, real world deployments.

Differential privacy is an inherently interdisciplinary field, drawing researchers from a variety of academic communities including machine learning, statistics, security, theoretical computer science, databases, and law. The combined effort across a broad spectrum of computer science is essential for differential privacy to realize its full potential. To this end, this workshop aims to stimulate discussion among participants about both the state-of-the-art in differential privacy and the future challenges that must be addressed to make differential privacy more practical.

New this year! We will be hosting a special session on "Differential Privacy for Health" and are especially encouraging submissions aligned with this theme.

Specific topics of interest for the workshop include (but are not limited to):

Submissions: Authors are invited to submit a short abstract of new work or work published since June 2025 (the most recent TPDP submission deadline). Submissions must be 4 pages maximum, not including references. Submissions may also include appendices, but these are only read at reviewer's discretion. There is no prescribed style file, but authors should ensure a minimum of 1-inch margins and 10pt font. Submissions are not anonymized, and should include author names and affiliations.

Submissions will undergo a lightweight review process and will be judged on originality, relevance, interest, and clarity. Based on the volume of submissions to TPDP 2025 and the workshop's capacity constraints, we expect that the review process will be somewhat more competitive than in years past. Accepted abstracts will be presented at the workshop either as a talk or a poster.

The workshop will not have formal proceedings and is not intended to preclude later publication at another venue. In-person attendance is encouraged, though authors of accepted abstracts who cannot attend in person will be invited to submit a short video to be linked on the TPDP website.

Selected papers from the workshop will be invited to submit a full version of their work for publication in a special issue of the Journal of Privacy and Confidentiality.

Important Dates

Abstract Submission
February 18, 2026 (AoE)
Notification
April 2, 2026
Workshop
June 1-2, 2026

Sponsors

We are very grateful to our sponsors whose generosity has been critical to the continued success of the workshop. For information about sponsorship opportunities, please contact us at tpdp.chairs@gmail.com.

Submission website

https://tpdp26.cs.uchicago.edu

For concerns regarding submissions, please contact tpdp.chairs@gmail.com

Organizing and Program Committee