Thursday, September 21, 2017

Quantifying Privacy Loss in a Statistical Database

DZone Database Zone
Quantifying Privacy Loss in a Statistical Database
Quantifying Privacy Loss in a Statistical Database

In the previous post, we looked at a simple randomization procedure to obscure individual responses to yes/no questions in a way that retains the statistical usefulness of the data. In this post, we'll generalize that procedure, quantify the privacy loss, and discuss the utility/privacy trade-off.

More General Randomized Response

Suppose we have a binary response to some question as a field in our database. With probability t, we leave the value alone. Otherwise, we replace the answer with the result of a fair coin toss. In the previous post, what we now call t was implicitly equal to 1/2. The value recorded in the database could have come from a coin toss and so the value is not definitive — and yet it does contain some information. The posterior probability that the original answer was 1 ("yes") is higher if a 1 is recorded. We did this calculation for t = 1/2 last time, and here we'll look at the result for general t.

No comments:

Fun With SQL: Functions in Postgres

DZone Database Zone Fun With SQL: Functions in Postgres In our previous  Fun with SQL  post on the  Citus Data  blog, we covered w...