On Distributed Differential Privacy and Counting Distinct Elements

On Distributed Differential Privacy and Counting Distinct Elements
Slide Note
Embed
Share

In this informative content, topics such as the Shuffle Model of Differential Privacy, Lower Bounds for Selection and Learning Parity, and techniques like Moment-Matching and Dominated Protocols are discussed. The concept of Differential Privacy is defined in detail, emphasizing the importance of privacy in algorithms that process data. The Distributed Setting is explored, where each user holds their data, and the balance between utility and privacy is crucial. The contrast between Central Model and Local Model in data privacy is also highlighted.

  • Differential Privacy
  • Counting Distinct Elements
  • Data Privacy
  • Algorithms
  • Privacy Protection

Uploaded on Feb 27, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. On Distributed Differential On Distributed Differential Privacy and Counting Privacy and Counting Distinct Elements Distinct Elements Edit 1/1 Lijie Chen Badih Ghazi Ravi Kumar Google Research Pasin Manurangsi MIT

  2. Todays Plan Part I (Introduction) The Shuffle Model of Differential Privacy The Count Distinct Problem and Our Results for it Lower Bounds for Selection and Learning Parity in the Shuffle Model Part II (Techniques) Lower Bounds via Moment-Matching Lower Bounds via Dominated Protocols Conclusion and Open Problems

  3. Differential Privacy: Definition An algorithm ? runs on a data set ? = ?? ? [?], produces a probabilistic output ?(?) {3,5,7,9,11} {3,5,7,13,11} We say two datasets are neighboring, if they differ by at most one element An algorithm ? is ?-differentially private if for all pairs of neighboring datasets ?1 and ?2, and all possible outputs ?, ?? 1 + ? Attacker knows ? ?= ??? ? and an output ? from ?(?) Pr ? ?1 = ? ?? Pr[? ?2 = ?] Pr ? ?2 = ? ?? Pr[? ?1 = ?] Likelihood of ?? being ? Pr[? ? ? ? = ?] Neighboring! Likelihood of ?? being ? Pr[? ? ? ? = ?]

  4. Differential Privacy: Definition An algorithm ? runs on a data set ? = ?? ? [?], produces a probabilistic output ?(?) {3,5,7,9,11} {3,5,7,13,11} We say two datasets are neighboring, if they differ by at most one element An algorithm ? is ?,? -differentially private if for all pairs of neighboring datasets ?1 and ?2, and possible events Pr ? ?1 ?? Pr ? ?2 + ? Pr ? ?2 ?? Pr ? ?1 + ?

  5. Differential Privacy in the Distributed Setting Imagine each user holds their own data ?? initially An algorithm ? runs on a data set ? = ?? ? [?], produces a probabilistic output ?(?) How do we ensure the privacy of each user? Trust How much should the user trust the curator? Three Important Dimensions Utility How useful is the algorithm? Privacy How private is the algorithm?

  6. Central Model Central Model vs (Non-interactive) Local Model Local Model Each user holds their own data ?? initially Each user holds their own data ?? initially After sending you raw data to the curator, you must trust their integrity for the rest of your life It s probabilistic Each user just sends their raw data ?? to a single curator Each user applies a randomizer ?, send ? ?? to the curator Algorithm ? runs on a data set ? = ?? ? [?], produces a privateprobabilistic output ?(?) Curator runs the algorithm on all these ? ?? ? [?]

  7. Differential Privacy: Local Model Local Model Imagine each user holds their own data ?? initially It s probabilistic Each user applies a randomizer ?, sends ? ?? to the curator Privacy Assumption for user with input ??: even the curator cannot figure out much about my input A Local randomizer ? is ?-Locally differentially private if for all pairs of possible inputs ? and ?, and all possible outputs ?, Pr ? ? ?? Pr ? ? + ? Pr ? ? ?? Pr ? ? + ? A local randomizer ? is ?,? -differentially private if for all pairs of inputs ? and ?, and possible events Curator runs the algorithm on all these ? ?? Pr ? ? = ? ?? Pr[? ? = ?] Pr ? ? = ? ?? Pr[? ? = ?] ? [?]

  8. Differential Privacy: Central Central vs Local How private is the algorithm? Local Privacy Fix the requirement to be (?,?/????(?))-DP The Aggregation Problem Each user gets ?? {0,1}, goal is to estimate the sum Utility Trust Complete Trust in the Curator ?(? ?)error Central ?(? ? ?)error Local Only Trust myself

  9. Differential Privacy: Shuffle Model Shuffle Model Complete Trust in the Curator Spectrum of Trust Only Trust myself Local Central Shuffle Model [IKOS06, BEM+17,CSU+18,EFM+19] ?(? ?)error Spectrum of Utility ?(? ? ?)error Central Local

  10. Differential Privacy: Central vs Local vs Shuffle How private is the algorithm? Privacy Fix the requirement to be (?,?/????(?))-DP The Aggregation Problem Each User gets ?? {0,1}, goal is to estimate the sum Utility Trust Complete Trust in the Curator ?(? ?)error Central ?(? ? ?)error [KLNRS 08] Local Only Trust myself ?(? ?)error [GKMP 20] [BBGN 19] Shuffle Only Trust the Shuffler

  11. Shuffle Model Shuffle Model Imagine each user holds their own data ?? initially It s probabilistic Each user applies a randomizer ?, sends ? ?? to the curator There exists a shuffler ?, which applies a random permutation on all {?(??)} Curator runs the algorithm on all these ? ?? ? [?]

  12. Shuffle Model Shuffle Model: Definition Local Randomizer ? Message Complexity Shuffler S How many messages one user sends in the worst case Transcript with Dataset ? = {??} ? ? = ?(? ?1,? ?2,? ?3, ,?(??)) A protocol ? is ?-differentially private if for all pairs of neighboring datasets ?1 and ?2, and all possible outputs ?, possible events Pr ? ?1 ?? Pr ? ?2 + ? Pr ? ?2 ?? Pr ? ?1 + ? A protocol ? is ?,? -differentially private if for all pairs of neighboring datasets ?1 and ?2, and Pr ? ?1 = ? ?? Pr[? ?2 = ?] Pr ? ?2 = ? ?? Pr[? ?1 = ?]

  13. Todays Plan Part I (Introduction) The Shuffle Model of Differential Privacy The Count Distinct Problem and Our Results for it Lower Bounds for Selection and Learning Parity in the Shuffle Model Part II (Techniques) Lower Bounds via Moment-Matching Lower Bounds via Dominated Protocols Conclusion and Open Problems

  14. The Count Distinct Count Distinct Problem Input: ? users, each holds their own data ?? Goal: Estimate the number of distinct elements among ?1, ,??, under the constraint of being ?,? -DP. (? = constant, ? = 1/poly(?)) Convention Green for Local Model Purple for Shuffle Model Red for Central Model Central Model: Constant-error is possible Main Result II in Shuffle-DP The problem is still maximally hard in the single-message (worst-case) shuffle model Main Result III in Shuffle-DP The problem is much easier in the shuffle model if each user sends at most one message in expectation Main Result I in Local-DP The problem is maximally hard in the local model (i.e., error = (?))

  15. Count Distinct Count Distinct in (Non-Interactive) Local Model Local Model Input: ? users, each holds their own data ?? Goal: Estimate the number of distinct elements among ?1, ,??, under the constraint of being ?,? -DP. Main Result I.3 There is a (?? ?) -Local DP protocol can solve Count Distinct with error ? Main Result I.2 For some ?=?/???????(?), No (?? ?,? ?(?))-Local DP protocol can solve Count Distinct with error ?(?) Main Result I.1 No (? ? ,?(?/?))-Local DP protocol can solve Count Distinct with error ?(?) Via Moment Matching Via Dominated Protocols

  16. Count Distinct Count Distinct in the Shuffle Model Shuffle Model Input: ? users, each holds their own data ?? Goal: Estimate the number of distinct elements among ?1, ,??, under the constraint of being ?,? -DP. Main Result II For all ? = ? ? , there are ? = ? ???????(?) and ? = ?/???????(?) such that no ?,? -DP protocol in the single-messageshuffle model can solve Count Distinct with error ?(?) Main Result III There is an (? ? ,?/????(?))-DP protocol in the shuffle model solving Count Distinct with error ?( ?), such that each user sends at most ? ?+ ?(?)messages in expectation.

  17. Count Distinct Count Distinct in the Shuffle Model Shuffle Model Main Result I.2 For some ?=?/???????(?), No (?? ?,? ?(?))-local DP protocol can solve Count Distinct with error ?(?) Lemma (Informal) New connection between Local and Single- MessageShuffle Model If a randomizer ? is (?(1),2 polylog(?))-DP in single-message shuffle model on ? users. Then ? is (ln ? (ln ln ?),? ?(1))-DP in the local model. Main Result II For all ? = ? ? , there are ? = ? ???????(?) and ? = ?/???????(?) such that no ?,? -DP protocol in the single-messageshuffle model can solve Count Distinct with error ?(?)

  18. Other Lower Bounds in the Shuffle Model Selection Input: ? users, ?-th user gets a preference vector ?? 0,1?. Score: For ? [?], ??= ?? [?]??,? Shuffle Model Learning Parity Input: ? users, ?-th user gets a vector ?? 0,1? and a bit ?? {0,1} Promise: There exists a hidden ? such that ??,? = ?? (mod 2) for all ? Goal: Find a ? such that ?? max 1 ? ??? 1/10 Goal: Find the hidden ? Main Result IV.1 Main Result IV.2 (? 1 ,?(1/?))-DP shuffle protocol with at most ? messages from each user cannot solve Selection if ? ?/? (? 1 ,?(1/?))-DP shuffle protocol with at most ? messages from each user cannot solve Learning Parity if ? ??/(?+?)

  19. Comparison with [Cheu Comparison with [Cheu- -Ullman 2020] Ullman 2020] Robust Shuffle Model (Robustness condition.) The protocol should still be private, even if a small constant fraction of users dropped out. A very desirable property in practice A recent independent work by [Cheu-Ullman] (? 1 ,?(1/?))-DP robustshuffle protocol with unbounded messages from each user cannot solve Selection if ? ?, and cannot solve Learning Parity if ? ??/? Our result does not work for the unbounded message case but applies to even non-robustshuffle protocols and also gives better lower bounds when ? < ? for Selection

  20. Maximal Separation Between Global Sensitivity Maximal Separation Between Global Sensitivity and and Two Two- -Party DP Party DP That is, changing one input bit of ? can make the value of ? changes by at most 1 There is a function ?: 0,1? 0,1? whose global sensitivity is one, yet no (O(1),?(1/?))-DP protocol for ? in the two-party model can achieve error ?(?/log ?). Check our paper for more details! Improved the previous 1 vs. ( ?) separation, and answered an Open Question by [McGregor-Mironov-Pitassi-Reingold-Talwar-Vadhan FOCS 2010]

  21. Todays Plan Part I (Introduction) The Shuffle Model of Differential Privacy The Count Distinct Problem and Our Results for it Lower Bounds for Selection and Learning Parity in the Shuffle Model Part II (Techniques) Lower Bounds via Moment-Matching Lower Bounds via Dominated Protocols The proof is very technical, I will only give the construction of hard instances, and some intuitions on why the lower bounds hold Conclusion and Open Problems

  22. Lower Bounds for Count Distinct Moment Moment- -Matching Estimate the Unseen Count Distinct via Matching Input: Given access to a list (?1,?2, ,??) Problem: Estimate the number of Distinct Elements in the list within an additive error ? ? and with probability 0.99 0 Query Complexity Model: Each time one can adaptively pick one element ?? from the list to query [Valiant-Valiant 13] [Wu-Yang 19] (?/log?) queries are required

  23. Lower Bounds for Count Distinct Moment Moment- -Matching Count Distinct via Matching Key idea in [WY 19]: Moment-Matching Random Variables Let ? = log? and = (?2). There are two random variables ? and ? s.t.: (Unit Expectation) ? ? = ? ? = 1 (Matching Moments) For all integers 1 ? ?, ? ??= ?[??] (Bounded Domain) Both ? and ? are supported on 0 [1, ] (Mostly zero vs. Mostly non-zero) Pr ? = 0 0.9 and Pr ? = 0 0.1. 1. 2. 3. 4. Goal Main Result I.2 For some ?=?/???????(?), No (?? ?,? ?(?))-Local DP protocol can solve Count Distinct with error ?(?) From ? and ?, construct two distributions of ?-input datasets ?? and ??, such that (1) no (?? ?,? ?(?))-Local DP protocol can distinguish them (2) #distinct elements in ?? and ?? differ by ?/10

  24. Hard Distribution: Signal/Noise Decomposition Setting: ? users. Each user gets an input from [D], for some ? = ?/polylog(?). Parameters: ?: a distribution over 0 1, ; ?: a random subset of [?] such that ? |?|/100. Construction of the Signal Part ?signal For each ? [?], we Construction of the Hard Distribution ?? Construction of the Noise Part ?noise For each ? ?, we ? ? ? Draw two datasets ?1 ?signal (1) draw ?? ?; (2) set ?? ?? (3) add ?? users with input ?. ? (1) set ?? (? ?)/|?| (2) add ?? users with input ?. and ?2 ?noise . Output their union (as a multi-set). Since ? ? = 1 has ? ? ? = ? inputs in expectation. So ?? (and ??) has roughly ? inputs. Goal: ? ?signal No (?? ?,? ?(?))-Local DP protocol can distinguish between ?? and ??.

  25. Hard Distribution: Signal/Noise Decomposition Setting: ? users. Each user gets an input from [D], for some ? = ?/polylog(?). Parameters: ?: a distribution over 0 1, ; ?: a random subset of [?] such that ? |?|/100. Construction of the Signal Part ?signal For each ? [?], we Construction of the Construction of the Noise Part ?noise ? ? Hard Distribution ?? For each ? ?, we ? Draw two datasets ?1 ?signal (1) draw ?? (? ?)/|?| ? and ?2 ?noise . (1) draw ?? ?; (2) draw ?? ??; (3) add ?? users with input ?. (2) add ?? users with input ?. Output their union (as a multi-set). ? ?signal provides signal ? , #DE(?) 0.1|?| ? , #DE(?) 0.5|?| W.h.p. for ? ?signal W.h.p. for ? ?signal (Mostly zero vs. Mostly non-zero) Pr ? = 0 0.9 and Pr ? = 0 0.1. ? ?noise For ? ?noise does not make a difference ? #DE(?) ? 0.01|?| (1) no (?? ?,? ?(?))-Local DP protocol can distinguish them (2) #distinct elements in ?? and ?? differ by ?/10

  26. Hard Distribution: Poissonization Poissonization Trick Trick Setting: ? users. Each user gets an input from [D], for some ? = ?/polylog(?). Parameters: ?: a distribution over 0 1, ; ?: a random subset of [?] such that ? |?|/100. Construction of the Signal Part ?signal Construction of the Signal Part ?signal For each ? [?], we For each ? [?], we Construction of the Hard Distribution ?? Construction of the Noise Part ?noise Construction of the Noise Part ?noise For each ? ?, we For each ? ?, we ? ? ? ? ? Draw two datasets ?1 ?signal (1) draw ?? ?; (2) draw ?? Poi ??; (3) add ?? users with input ?. (3) add ?? users with input ?. (1) draw ?? ?; (2) set ?? ?? ? (1) draw ?? Poi((? ?)/|?|) (2) add ?? users with input ?. (2) add ?? users with input ?. (1) set ?? (? ?)/|?| and ?2 ?noise . Output their union (as a multi-set). Poisson Distribution Parameter: ? Support: non-negative integers Specification: Pr Poi ? = ? =??? ? Mean: ? Poi ? = ? Goal: No (?? ?,? ?(?))-Local DP protocol can distinguish between ?? and ??. ?!

  27. Hard Distribution: Signal/Noise Decomposition Setting: ? users. Each user gets an input from [D], for some ? = ?/polylog(?). Parameters: ?: a distribution over 0 1, ; ?: a random subset of [?] such that ? |?|/100. Construction of the Noise Part ?noise For each ? ?, we (1) draw ?? Poi((? ?)/|?|) (2) add ?? users with input ?. Construction of the Signal Part ?signal ? Construction of the ? Hard Distribution ?? For each ? [?], we ? Draw two datasets ?1 ?signal (1) draw ?? ?; ? and ?2 ?noise . (2) draw ?? Poi ??; Output their union (as a multi-set). (3) add ?? users with input ?. (Matching Moments) For all integers 1 ? ?, ? ??= ?[??] ? ?noise provides noise No (?? ?,? ?(?))-Local DP protocol can distinguish between ?? and ??. (1) no (?? ?,? ?(?))-Local DP protocol can distinguish them (2) #distinct elements in ?? and ?? differ by ?/10

  28. Lower Bounds in Shuffle Model Dominated Protocols Dominated Protocols Green for Local Model Purple for Shuffle Model Blue for Dominated Protocols Shuffle Model via Convention Local Protocols Dominated Protocols We say a randomizer ?:? is ?,? - dominated dominated, if there exists a distribution ? on , such that for all ? ? and all ? , ?? ? ? ? ?? ?? A Local randomizer ? is ?,? -differentially private if for all pairs of inputs ? and ?, and possible events Pr ? ? ?? Pr ? ? + ? Pr ? ? ?? Pr ? ? + ? ?? + ? Set ? = ?(?0) for a fixed ?0 ?. Then Local Protocols are Dominated. Multi-Message Shuffle Protocols are Dominated If ? is (?,?)-DP in the ?-message shuffle model on ? users, then ? is (? + ? 1 + ln ? ,?)-dominated

  29. Lower Bounds for Count Distinct Dominated Protocols Dominated Protocols Count Distinct via Multi-Message Shuffle Protocols are Dominated If ? is (?,?)-DP in the ?-message shuffle model on ? users, then ? is (? + ? 1 + ln ? ,?)-dominated Selection Input: ? users, ?-th user gets a preference vector ?? 0,1? Score: For ? [?], ??= ?? [?]??,? Selection is Hard for Dominated Protocols (?,?(1/?))-dominated protocol cannot solve Selection if ? ? ln ?/? Goal: Find an ? such that ?? max 1 ? ??? 1/10 Main Result IV.1 (? 1 ,?(1/?))-DP shuffle protocol with at most ? messages from each user cannot solve Selection if ? ?/?

  30. Conclusion Conclusion and Open Problems Open Problems Lower Bounds for unboundedmessages in the Shuffle Model without robustness constraints? Extend the Single-MessageShuffle Model Lower Bound for Count Distinct to the regime that ? = 1/poly(?)? (Currently it requires ? = 2 polylog(?)). Thank you!

More Related Content