Differential Privacy Lower Bounds & Privacy Preserving Mechanisms

Download Presenatation
lower bounds for differential privacy n.w
1 / 41
Embed
Share

Explore the concept of lower bounds in differential privacy, how accurate responses must be to preserve privacy, the definition of privacy mechanisms, reconstruction attacks, and theorems explaining privacy preservation. Understand the impact of noise addition on data privacy.

  • Privacy
  • Differential Privacy
  • Mechanisms
  • Reconstruction Attacks
  • Privacy Preservation

Uploaded on | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Lower Bounds for Differential Privacy Dwork and A. Roth The algorithmic foundations of differential privacy, 2014 Gefen Morami TAU Differential Privacy Seminar, Decmenber 25th 2019

  2. How inaccurate must responses be in order not to completely destroy any reasonable notion of privacy?

  3. Definition 8.1. A mechanism is blatantly non-private if an adversary can construct a candidate database c that agrees with the real database d in all but O(n) entries, i.e., c d 0 ?(?).

  4. Reconstruction attacks user d ?1 ?2 ?3 ?4 ?5 ?6 1 0 ? = (?1, . . . , ??). ?? {0,1} Issuing a string of queries each described by susbset rows S S = 0,1,1,0,0,1 S ? = 0,1,1,0,0,1 1,0,1,1,0,1 = 2 A(S) = ?=1????- true value r(S) response to query S Error - E(S, r(S)) = |A(S) r(S)| How much noise is needed in order to preserve privacy? 1 1 0 1

  5. Theorem 8.1. Let M be a mechanism with distortion of magnitude bounded by E. Then there exists an adversary that can reconstruct the database to within 4E positions. An easy consequence of the theorem is that a privacy mechanism adding noise with magnitude always bounded by, say, adversary to correctly reconstruct 99% of the entries. ? 401, permits an

  6. Proof Let d be the true database Estimate the number of 1s in all possible sets: Query M on all subsets S [n]. Rule out distant databases: ??? ????? ????????? ???????? ? 0,1? ,?? ? ? ??? ? ?? | ? ??? ?(?)| > ?,? ?? ???? ??? ?. We will argue that the number of positions in which ? and ? differ is at most 4 ?

  7. ?0 0 ?? 1 user d 1 1 4E 2 0 1 0 3 1 0 1 4 1 0 1 ?0 be the indices in which ??= 0 ?0 = ? ?? = 0} ,?1= {? | ??= 1}. Since c was not ruled out -> |? ?0 by assumption |? ?0 c and d differ in at most 2? The same for ?1thus c and d differ at most 4E positions we can avoid the above attack by restricting the adversary to fewer than 2? queries 5 0 1 0 6 1 0 1 ? ?0??| ? ? ?0?? | ?

  8. What if we consider more realistic bounds on the number of queries?

  9. Lets investigates the feasibility of small error ? when the number of queries is linear in ?

  10. Why ? threshold as threshold on noise? Lets say database contain ? users uniformly at random ? satisfying a given condition The number of people satisfying the condition are ?? The sampling error is ? We would like the noise to be smaller than the sampling error ?

  11. Query - ? 1,1? Real DB - ? 1,1? Candidate ?? 1,1? ? d*v

  12. Query - ? 1,1? Real DB - ? 1,1? Candidate ?? 1,1? Assume differ in at least ?? rows ? ? -2 2 ?2 ? 0 ?4 ? ? ? -2 ? ? 2 ? d*v Thus , ? ? and ? ? differ at least ? ? ? ? ? ? ? ? ?1 ? ?3 ? ?5 ?

  13. Theorem 8.2. For any > 0 and any function = (n), there is constant b and an attack using ?? 1 questions that reconstructs a database that 2 2? ? agrees with the real database in all but at most curator answers at least 1\2 + ? of the questions within an absolute error of ? entries, if the

  14. Lemma 8.3. Let Y = ?=1 ???where each ?? is a 2 independent Bernoulli random variable with mean zero. +1 Then for any y and any ?, ?? ? 2?,2 ? + ? . Proof in class

  15. Back to Theorem 8.2 For any > 0 and any function = (n), there is constant b and an attack using ?? 1 questions that reconstructs a database that agrees with the 2? ? least 1\2 + ?of the questions within an absolute error of v 1,1n responses (?1, . . . , ???) output any database c such that |?? ??? | for at least 1\2 + ? of the indices ? A matrix ?? ? (vectors ??) 2 entries, if the curator answers at real database in all but at most

  16. Proof | ??? ?? | for a 1/2+ fraction of ? [??] | ??? ?? | for a 1/2+ fraction of ? [??] agree on at least a 2 fraction of ? [??] for at least 2??? values of ?, | ? ? ?? | 2? 2? ? 2 But we need to proof bound on entries...... 2 2? ? We show that the probability it disagrees at least | ? ? ?? | 2? is extremely small so small its unlikely Proof in class and make

  17. Better Defense We would like to find low-distortion mechanisms that are hard to break In particular, when the noise is always in ? attack using exactly ? fixed queries Moreover, there is even a computationally efficient attack requiring a linear number of queries in which a 0.239 fraction may be answered with wild noise ? there is an efficient

  18. Internet Scale In the case of internet scale data sets n 108 n extremely large Infeasible to answer n queries This inquiry led to the first algorithmic results (?,?)-differential privacy By adding binomial noise of order O( ?)

  19. Lower bounds for differential privacy

  20. Lower bounds for differential privacy The results of the previous section yielded lower bounds on distortion needed to ensure any reasonable notion of privacy In contrast, the result in this section is specific to differential privacy A (randomized) algorithm ?:?? ? satisfies ?,? -differential privacy if for all datasets ?,? ?? that differ on one entry, For all subsets ? of the outcome space ?, M? ? ? ??Pr M? ? ? + ?. Pr

  21. Lemma 8.4. Assume the existence of a set ? = {?1,...,?2? }, where each ?? ?? , such that for ? ?, ?? ?? 1 . Further, let ? ?? ?? be a k-dimensional query. For 1 ? 2? , let ?? denote a region in ?? , the answer space, and assume that the ?? are mutually disjoint. If M is an ( , 0)-differentially private mechanism for F such that, 1 ? 2? ,Pr ? ?? ?? 1 2, then ? ln 2 ? 1 . Proof in class.

  22. USERS ?? |S| = 2? ?1 ?1 ?1 Pr ? ?? ?? 1 2 ?2 ?2 ?3 ?3 ? ??

  23. Corollary 8.5. Let ? = {?1,...,?2? }be as in Lemma 8.4, and assume that for any ? ?, ? ?? ? ?? ?. Let ?? denote the ? ball in ?? of radius ? Let M be any -differentially private mechansim for F satisfying 1 ? 2? : Pr ? ?? ?? 1 2centered at ?(??). ln 2 ? 1 2. Then ?

  24. USERS ?? |S| = 2? ?1 ?1 ?1 Pr ? ?? ?? 1 ? 2 ?2 ?2 2 ? ?3 ?3 ? ??

  25. Claim 8.6 ln 2 ? 1 2 with probability To obtain ( , 0)- differential privacy for ? must add noise with ? norm greater than ? exceeding 1 2. , the mechanism

  26. Theorem 8.7 Let ? = 0,1? . Let ? ?? ?? be an ?,0 - differentially private mechanism such that for every database ? ?? with probability at least 1 2, ?(?) outputs all of the 1-way marginals of x with error smaller than ? 2. That is, for each ? ? , the ?th component of ?(?) should approximately equal the number of rows of x whose ?th bit is 1, up to an error smaller than ?/2. Then ? ? ?.

  27. Proof For every string ? 0,1? , consider the database ?? consisting of ? identical rows, all of which equal w. Let ?? ??consist of all tuples of numbers that provide answers to the 1-way marginals on x with error less than ? ??: ? [?] |?? ??? | < ?/2}. 2. That is, ??= ?1,...,??

  28. Proof Put differently, ?? is the open of radius ? Notice that the sets ?? are mutually disjoint. If ? is an accurate mechanism for answering 1-way marginals, then for every ? the probability of landing in ?? when the database is ?? should be at least 1 2: Pr ? ?? ?? 1 2 around ?? 0,?? . 2 ln 2 ? 1 ? Thus, setting = ? and ? = ? in Corollary 8.5 we have ?

  29. Real word problem Lets define user ?? with ? attributes such that the ?th attribute of ?? indicate if user ?? had vaccine or not (0/1) Flu Polio Hepatitis B HPV ?1 1 0 0 1 ?2 0 1 1 1 ?3 1 1 1 0

  30. ?0,0,0,0 ?0,0,?,? Flu Polio Hepatitis B HPV Flu Polio Hepatitis B HPV User-1 0 0 0 0 User-4 0 0 1 1 ?0,0,0,? ?0,?,0,0 Flu Polio Hepatitis B HPV Flu Polio Hepatitis B HPV User-2 0 0 0 1 User-5 0 1 0 0 ?0,0,?,0 ??,?,?,? Flu Polio Hepatitis B HPV Flu Polio Hepatitis B HPV User-2? User-3 0 0 1 0 1 1 1 1

  31. ?? Flu Polio Hepatitis B HPV 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 ??? ?? 0 0 0 n

  32. The error bound to privacy Note that this bound is tight to within a constant factor, by the simple composition theorem, and that it separates: ? ?needed for (?,0)-differential privacy Thus, its tight bound for (?,0)-differential privacy Laplace noise with parameter 1 ? ? ln with ? = 2 log2? Laplace noise with parameter 2? needed for ( , )-differential privacy so the upper bound is lower than the lower bound of (?,0)-differential privacy

  33. Counting Queries A basic type of query that we will examine extensively is a counting query, which is specified by a predicate on rows ? ? 0,1 Extended to datasets ? ?? by counting the fraction of people in the dataset satisfying the predicate: 1 ? ?=1 ?(??) ? ?(?) =

  34. Families of counting queries Point Functions (Histograms): ??: ? 0,1 for each ? ? ???= ???(?) Threshold Functions (CDFs): ?_?(?) ? ?? ??????? 1 ??? ? ? foreach ? ? ?? ?= ?? ?(?) Attribute Means (1-way Marginals): Here ? = 0,1? , ??: 0,1? 0,1 , ??? = ????? ? = 1,...,???????= ??????(?)

  35. Theorem 5.14 Let ? = {? ? {0,1}} be any class of counting queries that distinguish all the elements of X. That is, for all ? ? ?, there is a query ? ? such that ?(?) ?(? ). Suppose ? ?? ??is an ( , )-differentially private mechanism that with high probability answers every query in Q with error at most ?. log1 log ? ?? ,1 2 ? ? min , ??

  36. Proof Trivial (0, 0)-differentially private algorithm that answers 1 queries. 2 for all

  37. Datasets - C ?? ? ?? ?? ?1 ?1 ?? Pr ? ? ?? 1 ? 2 ? ?2 ?? >2? ? ? ?? ?3 ? ?q Q

  38. Proof We will now construct a set C of |?| datasets for which the ?? s are disjoint. Specifically, for each ? ?, let ? ? ?? be the dataset whose first m = 2?? + 1 rows are all equal to ?, and whose remaining n m rows are all equal to ?0for a fixed element ?0 ? C = {?(?) ? ?}

  39. ?? ? and ?? ?are disjoint for ? ? ? ? - > let ? be a query such that ?(?) ?(? ) |?(?(?)) ?(?(? ))| = ? ?> 2? The datasets in C are all at distance at most ? from the dataset ? ?0 by Theorem 5.13, we deduce that : 4 m = 2?? + 1 ? (log|?|/??). 1 ? ? ??

  40. Theorem 8.8. 1 40, where ? min ? ?,? For any ?,?,? ? and ? query ? ?? ??with per-coordinate sensitivity at most 1 such that any ( , 0)-differentially private mechanism adds noise of ? norm min ? at most n. Note that ? = |? | need not be large here, in contrast to the requirement in Theorem 8.7. 0, ?, there is a ? ?,? with probability at least 1 2 on some databases of weight

  41. Thank you for your time :)

More Related Content