
Private Multiplicative Weights for Online Query Release
Learn about the Private Multiplicative Weights algorithm for releasing online queries while ensuring privacy. Understand the composition theorems, advanced composition concepts, and the practical implementation of the sparse vector technique. Dive into the mechanism's iterative approach to approximating real databases with public data.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Online Query Release via Online Query Release via Private Multiplicative Weights Private Multiplicative Weights Omri Keren Dec 4, 2019
The Settings Input: A private database ? given as a histogram, i.e. a distribution over the data universe ?; A series of linear queries ?? ? (may be chosen adaptively) We will represent them as vectors over 0,1 Each query has a sensitivity of 1 ? ? Output: A stream of query answers {??}
Example Say we have a database of ? = 1000 people, each belonging to (exactly) one category from the set {A, B, C}. Then ? = 3. A B C 0.1 0.3 0.6 ? = 100 300 600 A linear query ? we could ask: What are 0.4 of category A and 0.5 of category C? 0.4 0 0.5 Can be represented as ? =
Reminder: Composition By the simple composition theorem, privacy cost increases proportionally to the number of queries answered. Say we have a set ? of counting queries and we wish to obtain a final privacy of (?,?), then each query should be guaranteed with ? ? |?|, = ? |?|-privacy. |?| ??. 1 So the Laplace noise added per query will be ? ? |?| ? To bound the maximum noise added to any of the queries, we use the union bound: By setting ? = ? |?|we get that the maximal noise is bounded by ? = ? ?? . 1 |?| log |?|
Advanced Composition According to the theorem, if a mechanism ? is composed of queries ?1, ,??that are each 1 ? ?, ? ? + ? - (?,?)-differentially private, then for all ? > 0, ? is ? ? log differentially private. So if we want a target privacy parameters (?,?), each of the queries should be guaranteed with ?0= ? ? log1 ? ? privacy parameter. ? log1 ? The Laplace noise added per query has scale ? . ?? ? log1 ? log ? log ? ?0? The maximal noise is given by ? = ? = ? . ??
Reminder: Sparse Vector Technique Useful when only a small portion of the queries result in an answer larger than a certain threshold ?. Given a stream of ? queries, NumericSparse outputs a stream of answers ?? ?=1 . The mechanism is ?,? -accurate (with respect to ?) if: With probability 1 ?, for all ? such that ?? : ??? ?? ? And for all ? such that ??= : ??(?) ? + ? ?
The Mechanism The general idea: An iterative algorithm which applies the multiplicative weights method to the sparse vector technique. We will hold at all times a public approximation of the real database ?. At each iteration ?: If the true answer to a query differs substantially from the answer on the current approximation ??, we will update ??to move closer towards ?. Otherwise, we will return the answer on ??.
How many updates are needed? Given a class of linear queries ? and a database ? (given as a histogram over ), we denote x1[?] = 1 ?for all ?. Theorem: Consider a maximal length sequence of databases ?2, ,??such that the following conditions hold for each ?,?? ?,?? : ??? ???? > ? ??? ?? < ? ??+1= ??(??,??,??) Then ? 1 +4 log ? ?2
How many updates are needed? Proof: We will use KL divergence between ? and ??as a potential function: ? ? ? log(? ? ? ??(?| ?? = ??[?]) ?=1 Claim 1: For all ?: ? 0 and 1 log|?|.
How many updates are needed? Cont. Proof: By the log-sum inequality, if ?1, ,??,?1, ,??are all non-negative numbers, then ?? log ??? ?? ?? ?? log ??? ? ? (Follows from Jensen s Inequality for the convex function ? ? = ?log?.) Plugging ??= ? ? ,??= ??? ?, we get that KL divergence is always non- negative (recall that ?,??are both probability distributions).
How many updates are needed? Cont. 1 ?for all ? 1= ?=1 ?? ? log( ? ? ? ) ?1? = ?? ? + ?=1 ?? ? log?[?] = log ? ?=1 = log ? ?[?] Since ? ? 0, this quantity is maximized when ? ? = 0 (e.g. when ? 1 = 1 and ? ? = 0 for all ? > 1), resulting in log|?|.
How many updates are needed? Cont. Claim 2: For all ?, it holds that: ??,?? ??,? ?2 ? ?+1 ? Proof: on the blackboard.
By the theorems conditions, for every ?, ??? ???? ??? < ????iff ??< ???? ? and ?? ??? < ? ???? ??? ? ??< ???? ??= ?? ?2 ? 2 ? ?2 4=?2 ? 2 ? ?+1 ??,?? ??,? ??=?? 4 4 ?=? 2
??? ???? ? ?? ???? ??= 1 ?? ?2 ?2 2 ? ?2 4=?2 ? 2 ? 2 1 ???? 1 + ??? 4 ? ? ?+1 ??,?? ??,? = 4 4 ?=? ??=1 ?? 2 Finally, we have ? 1 ? ?+1 ? 1 ?2 log ? 1 ?= 4 ?=1 ? 1 +4log ? ?2
The mechanism We will use NumericSparse in order to ask at each iteration if ? ? ? ?? is large. Each such query will be asked as two separated queries. Recall that NumericSparse returns either or some real value if answer is significantly above the threshold. If the returned value is numeric, we return a noisy answer and trigger an update for ??.
OnlineMultiplicativeWeights OnlineMultiplicativeWeights ?,?,? = 0,?,?, ?? 1) ? 4 log 2 18? log 2 ? +log ? ?? 3) Initialize NumericSparse NumericSparse(?, ?? ,?,?,?,? = 0) with a stream of queries ?? , outputting a stream of answers ??. 4) ? 1 1 ? ? ? 6) For each query ??: a. ?2? 1 ?? ???? b. ?2? c. If ?2? 1= ??? ?2?= then i. ?? ???? d. Else i. If ?2? 1 then 1. ?? ????+ ?2? 1 ii. Else 1. ?? ???? ?2? iii. ??+1 ?? ??,??,?? iv. ? ? + 1 4? 2) ? 1= 5) ?? ???? ??
Privacy Analysis Note that the mechanism is (?,?)-differentially private. Since the online mechanism accesses the database only through NumericSparse, this follows directly from the privacy analysis of NumericSparse.
Accuracy Analysis (? = 0) Theorem: With probability at least 1 ?, for all queries ??, the mechanism returns an answer ?? such that ??? ?? 3? for any ? such that: 32??? ? ?2? 36 log |?| log |?|+log ? ??2? Alternatively: 1 3 log ? ???2 log ? log ? ?? ? = ? log ? ? = ?
Accuracy Analysis (? = 0) Cont. Proof: We saw that given ? queries and a maximum number ? of above-threshold queries, NumericSparse is (?,?)-accurate for any ? such that 4? ? 9? log ?+log ? ? In our case: ? =4 log |?| ? = 2 ? ?2
Accuracy Analysis (? = 0) Cont. Plugging all these and normalizing ? by ?1= ? give us: 36log|?| log ? + log32log|?| ?2? ? = ??2? ? = 2?
Accuracy Analysis (? = 0) Cont. Assume that we are in the high probability case 1 ? . For all ? such that an update is triggered for ??, it holds that: ??? ???? ? ? = ? Therefore, the conditions of the updates theorem are satisfied, and thus, there are at most ? =4 log |?| ?2 update steps. By the accuracy properties of NumericSparse, At most one of ?2? 1, ?2?will have value . For all ? such that no update is triggered and we return ??= ????, we have ??? ??| = |??? ???? ? + ? = 3?. Finally, by the second condition of the updates theorem, for all ? such that an update is triggered we have ??? ?? ?.
OnlineMultiplicativeWeights OnlineMultiplicativeWeights ?,?,?,?,?, ?? 1) ? 4 log 2 ? log2 ? log ?+log4? 2+32 2 ? 2) ? ?? 3) Initialize NumericSparse NumericSparse(?, ?? ,?,?,?,?) with a stream of queries ?? , outputting a stream of answers ??. 4) ? 1 1 ? ? ? 6) For each query ??: a. ?2? 1 ?? ???? b. ?2? c. If ?2? 1= ??? ?2?= then i. ?? ???? d. Else i. If ?2? 1 then 1. ?? ????+ ?2? 1 ii. Else 1. ?? ???? ?2? iii. ??+1 ?? ??,??,?? iv. ? ? + 1 1= 5) ?? ???? ??
Accuracy Analysis (? > 0) Theorem: With probability at least 1 ?, for all queries ??, the mechanism returns an answer ?? such that ??? ?? 3? for any ? such that: log ? log2 32 log ? ?2? 2+32 2 ?log |?|+log ? ???
Accuracy Analysis (? > 0) Cont. Proof: We saw that given ? queries and a maximum number ? of above-threshold queries, NumericSparse is (?,?)-accurate for any ? such that 4? ? ? log2 log ?+log ?16 2+1 ? ? Plugging ? =4 log |?| ,? = 2 ? gives the desired bound. ?2