In the previous post, we looked at a simple randomization procedure to obscure individual responses to yes/no questions in a way that retains the statistical usefulness of the data. In this post, we'll generalize that procedure, quantify the privacy loss, and discuss the utility/privacy trade-off.
More General Randomized ResponseSuppose we have a binary response to some question as a field in our database. With probability t, we leave the value alone. Otherwise, we replace the answer with the result of a fair coin toss. In the previous post, what we now call t was implicitly equal to 1/2. The value recorded in the database could have come from a coin toss and so the value is not definitive — and yet it does contain some information. The posterior probability that the original answer was 1 ("yes") is higher if a 1 is recorded. We did this calculation for t = 1/2 last time, and here we'll look at the result for general t.
No comments:
Post a Comment