Tuesday, April 15, 2008

Privacy and Information Theory

Privacy is a evergreen topic in technology discussions, and increasingly finds its way into the mainstream (cf. AOL, NSA, Facebook). My impression is that most people feel that companies and government agencies are amassing their "private" data to some nefarious end.

Let's forget about technology for a moment and subject the notion of privacy to basic examination. If I truly want to keep a secret, I don't tell anyone. If I want to share information with you but no one else, I only disclose the information under the proviso of a social or legal contract of non-disclosure.

But there's a major catch here: you--or I--may disclose the information involuntarily by our actions. The various establishments I frequent know my favorite foods, drinks, and even karaoke songs. More subtly, if I tell you in confidence that I don't like or trust someone, that information is likely to visibly affect your interaction with that person. Moreover, someone who knows that we are friends might even suspect me as the cause for your change in behavior.

What does this have to do with privacy of information? Everything! The mainstream debates treat information privacy as binary. Even when people discuss gradations of privacy, they tend to think in terms of each particular disclosure (e.g., age, favorite flavor of ice cream) as binary. But, if we take an information-theoretic look at disclosure, we immediately see that this binary view of disclosure is illusory.

For example, if you know I work for a software company and live in New York City, you know more about my gender, education, and salary than if you only know that I live in the United States. We can quantify this information gain in bits of conditional entropy.

Information theory provides a unifying framework for thinking about privacy. We can answer questions like "if I disclose that I like bagels and smoked salmon, to what extent to I disclose that I live in New York?" Or to what extent does an anonymized search log identify me personally.

If we can take this framework and make it consumable to non-information theorists, perhaps we can improve the quality of the privacy debate.

2 comments:

Anonymous said...

Gerome Miklau and Dan Suciu, "A Formal Analysis of Information Disclosure in Data Exchange", ACM Conference on Management of Data (SIGMOD), 2004.

Daniel Tunkelang said...

Thanks, I'll have to digest that. Of course, the framework is only the first step. Closing the deal requires communicating the essence of this framework to policy makers and to the broader public, so that we can as a society talk rationally about privacy and information disclosure.

Tuesday, April 15, 2008

Privacy and Information Theory

Privacy is a evergreen topic in technology discussions, and increasingly finds its way into the mainstream (cf. AOL, NSA, Facebook). My impression is that most people feel that companies and government agencies are amassing their "private" data to some nefarious end.

Let's forget about technology for a moment and subject the notion of privacy to basic examination. If I truly want to keep a secret, I don't tell anyone. If I want to share information with you but no one else, I only disclose the information under the proviso of a social or legal contract of non-disclosure.

But there's a major catch here: you--or I--may disclose the information involuntarily by our actions. The various establishments I frequent know my favorite foods, drinks, and even karaoke songs. More subtly, if I tell you in confidence that I don't like or trust someone, that information is likely to visibly affect your interaction with that person. Moreover, someone who knows that we are friends might even suspect me as the cause for your change in behavior.

What does this have to do with privacy of information? Everything! The mainstream debates treat information privacy as binary. Even when people discuss gradations of privacy, they tend to think in terms of each particular disclosure (e.g., age, favorite flavor of ice cream) as binary. But, if we take an information-theoretic look at disclosure, we immediately see that this binary view of disclosure is illusory.

For example, if you know I work for a software company and live in New York City, you know more about my gender, education, and salary than if you only know that I live in the United States. We can quantify this information gain in bits of conditional entropy.

Information theory provides a unifying framework for thinking about privacy. We can answer questions like "if I disclose that I like bagels and smoked salmon, to what extent to I disclose that I live in New York?" Or to what extent does an anonymized search log identify me personally.

If we can take this framework and make it consumable to non-information theorists, perhaps we can improve the quality of the privacy debate.

2 comments:

Anonymous said...

Gerome Miklau and Dan Suciu, "A Formal Analysis of Information Disclosure in Data Exchange", ACM Conference on Management of Data (SIGMOD), 2004.

Daniel Tunkelang said...

Thanks, I'll have to digest that. Of course, the framework is only the first step. Closing the deal requires communicating the essence of this framework to policy makers and to the broader public, so that we can as a society talk rationally about privacy and information disclosure.

Tuesday, April 15, 2008

Privacy and Information Theory

Privacy is a evergreen topic in technology discussions, and increasingly finds its way into the mainstream (cf. AOL, NSA, Facebook). My impression is that most people feel that companies and government agencies are amassing their "private" data to some nefarious end.

Let's forget about technology for a moment and subject the notion of privacy to basic examination. If I truly want to keep a secret, I don't tell anyone. If I want to share information with you but no one else, I only disclose the information under the proviso of a social or legal contract of non-disclosure.

But there's a major catch here: you--or I--may disclose the information involuntarily by our actions. The various establishments I frequent know my favorite foods, drinks, and even karaoke songs. More subtly, if I tell you in confidence that I don't like or trust someone, that information is likely to visibly affect your interaction with that person. Moreover, someone who knows that we are friends might even suspect me as the cause for your change in behavior.

What does this have to do with privacy of information? Everything! The mainstream debates treat information privacy as binary. Even when people discuss gradations of privacy, they tend to think in terms of each particular disclosure (e.g., age, favorite flavor of ice cream) as binary. But, if we take an information-theoretic look at disclosure, we immediately see that this binary view of disclosure is illusory.

For example, if you know I work for a software company and live in New York City, you know more about my gender, education, and salary than if you only know that I live in the United States. We can quantify this information gain in bits of conditional entropy.

Information theory provides a unifying framework for thinking about privacy. We can answer questions like "if I disclose that I like bagels and smoked salmon, to what extent to I disclose that I live in New York?" Or to what extent does an anonymized search log identify me personally.

If we can take this framework and make it consumable to non-information theorists, perhaps we can improve the quality of the privacy debate.

2 comments:

Anonymous said...

Gerome Miklau and Dan Suciu, "A Formal Analysis of Information Disclosure in Data Exchange", ACM Conference on Management of Data (SIGMOD), 2004.

Daniel Tunkelang said...

Thanks, I'll have to digest that. Of course, the framework is only the first step. Closing the deal requires communicating the essence of this framework to policy makers and to the broader public, so that we can as a society talk rationally about privacy and information disclosure.