The Sudden Surge of Shuffling in the Privacy Literature

Within the past year, there has been a deluge of papers in the shuffled model. This page is designed to guide a curious onlooker to the recent developments.

Overview of the Model

In a shuffled protocol, each user applies a randomized function on their sensitive data. The output is some vector of messages. A service or party called the shuffler concatenates all message vectors and then applies a uniformly random permutation. A shuffled protocol is differentially private when this permuted vector is differentially private with respect to a change any one user's data. We can hope for better "privacy-utility trade-offs" than the local model because of the following intuition: if we uniformly permute the messages generated by a locally private protocol, then an adversary who wishes to recover a particular user's data must break through the noise from the other users' messages in addition to the noise in the target's message. Amplification lemmas formalize this local-to-shuffled intuition.

Note that such amplification lemmas always map a local protocol to a shuffled protocol whose message vectors have length exactly 1, due to the structure of the local model. Such shuffled protocols are called single-message, in contrast to the general multi-message model; in the timeline below, we shall see that single-message protocols are provably weaker than multi-message protocols. When stating bounds, we will ignore \(\log(1/\delta)\) terms.

Phase 0: Beginnings

Ishai et al's "Cryptography from Anonymity" (author initials spell IKOS). Introduces...

  • a rigorous definition of an anonymization primitive. This version allows interactivity: a party can respond to a message without knowing its sender.
  • a single-server PIR protocol that leverages this "anonymous response" operation to have nontrivial communication complexity.
  • a multi-message protocol to compute sums securely via shares. The security definition is not differential privacy but rather indistinguishability between datasets that have the same sum.

Bittau et al's 2017 "PROCHLO." Introduces the Prochlo system and the intuition for shuffling in the context of differential privacy.

Phase 1: Formalization of the Model and Baseline Results

Cheu et al's "Distributed Differential Privacy via Shuffling" (author initials spell CSUZZ). Introduces...

  • The formalized shuffled model.
  • A single-message protocol for binary sums. Built on randomized response.
  • Applications of the binary sum protocol. Their protocol for d-value histograms achieves \(O(\sqrt{\log(d)}/\varepsilon)\) error but demands \(d\) messages per user. Their protocol for real-valued sums achieves \(O(1/\varepsilon)\) error but demands \(\sqrt{n}\) messages per user.
  • A generic lemma to obtain lower bounds for single-message protocols: if a single-message protocol is \((\varepsilon,\delta)\) private, then its randomizer is \((\varepsilon + \ln(n), \delta) \) private

Erlingsson et al's "Amplification by Shuffling." Introduces the first amplification lemma, but only for small eps (driven down to \(\approx \varepsilon/\sqrt{n}\) ). We note that the model they analyze is slightly different than what we've described: here, users are shuffled and then the authors consider sequentially interactive local protocols that are run on the anonymized users. The amplification holds in our model when the interactivity is not taken advantage of.

Balle et al's "The Privacy Blanket of the Shuffle Model." Introduces...

  • A single-message protocol for real-valued sums with \(O(n^{1/6})\) error
  • A matching lower bound. Taken together with CSUZZ's protocol, this polynomially separates the single-message model from the multi-message model.
  • Another amplification lemma that allows for larger \(\varepsilon\)

Balle et al's "Differentially Private Summation with Multi-message Shuffling." Introduces a way to use IKOS to get Laplace-like error bounds for real-valued sums, but it is still approx. d.p. Uses \(O(\log(n))\) messages. Note that Ghazi et al's "Scalable and D.P. Distributed Aggregation in the Shuffled Model" independently came up with the same idea.

Phase 2: Improving Communication Complexity

Ghazi et al's "Private Heavy Hitters and Range Queries in the Shuffled Model" Introduces multi-message protocols to do the titular tasks. Notably, their histogram protocols use exponentially fewer messages than that of CSUZZ. One of these protocols utilizes shared randomness, which is a first in the shuffled literature. Folded into "On the Power of Multiple Anonymous Messages".

Balle et al's "Improved Summation from Shuffling." Improves the analysis of the IKOS protocol to cut down on the number of messages. It is \(O(1)\) for reasonable parameter regimes. In turn, real-sum is computable in the shuffled model under d.p. with \(O(1)\) messages

Phase 3: Protocols from New Techniques

Balcer & Cheu's "Separating Local and Shuffled Differential Privacy via Histograms." Introduces...

  • A multi-message protocol to estimate histograms whose error does not depend on the number of bins
  • A reduction from the pointer-chasing problem to histograms. Combined with the above protocol and a lower bound in (Joseph et al., 2019), this implies the shuffled model requires arbitrarily fewer samples than the (non-interactive) local model.
  • A proof that single-message protocols that satisfy pure differential privacy are equivalent to local protocols.

Ghazi et al's "Pure Differentially Private Summation from Anonymous Messages." Introduces the first...

  • Lower bounds for the multi-message model (but limited to pure differential privacy and bounded communication)
  • A nontrivial shuffled protocol for binary sums under pure differential privacy

Phase 4: Lower Bounds

Balcer et al.'s "Connecting Robust Shuffle Privacy and Pan-Privacy." Introduces...

  • A formal notion of robust privacy in the shuffled model: if only a \(\gamma\) fraction of users run the randomizer, the privacy parameters should degrade gracefully with \(\gamma\).
  • A robustly shuffle private uniformity tester
  • A proof that robustly shuffle private uniformity testing implies pan-private uniformity testing
  • Similar results for the distinct elements problem
The reductions imply separations between central privacy and robust shuffle privacy. In the case of distinct elements, the lower bound is \(\Omega(\sqrt{k/\varepsilon})\) while the central model upper bound is \(O(1/\varepsilon)\).

Chen et al.'s "On Distributed Differential Privacy and Counting Distinct Elements." Introduces...

  • A multi-message protocol for counting distinct elements with constant expected message complexity.
  • Bespoke lower bounds for local and single-message distinct elements protocols.
  • The concepts of "dominated" and "pseudo-locally private" protocols.
  • A generic lemma to obtain lower bounds: if a \(k\)-message shuffled protocol satisfies \(\varepsilon,\delta\)-privacy, then its randomizer must be "\(\varepsilon+k(1+\ln n),\delta\)-dominated." The authors then give lower bounds for dominated protocols.

Beimel et al.'s "On the Round Complexity of the Shuffle Model." Introduces...

  • A way to instantiate MPC in two rounds of communication in the shuffle model.
  • A generic lemma to obtain lower bounds for one-round and constant-message protocols. It essentially bounds the mutual information of the randomizer's output.