The Cost of Privacy: Destruction of Data-Mining in Anonymized Data Publishing

S. Priya; A. Priya; V. Divya

The Cost of Privacy: Destruction of Data-Mining in Anonymized Data Publishing

S. Priya, A. Priya, V. Divya

Abstract

Search engines play a crucial role in the navigation through the vastness of the web. Today‟s search engines do not just collect and index web pages, they also collect and mine information about their users. They store the queries, clicks, IP-addresses, and other information about the interactions with users in what is called a search log. In this paper, we analyze algorithms for publishing frequent keywords, queries, and clicks of a search log. We first show how methods that achieve variants of k-anonymity are vulnerable to active attacks. We then propose an algorithm ZEALOUS and show how to set its parameters to achieve probabilistic privacy. We also contrast our analysis of ZEALOUS with an analysis by Korolova et al. [17] that achieves in-distinguishability. Our paper concludes with a large experimental study using real applications where we compare ZEALOUS and previous work that achieves k-anonymity in search log publishing. Our results show that zealous yields comparable utility to k-anonymity while at the same time achieving much stronger privacy guarantees and also can be applied more generally to the problem of publishing frequent items or item sets. A topic of future work is the development of algorithms to release useful information about infrequent keywords, queries, and clicks in a search log while preserving user privacy.

Keywords

Security, Integrity, and Protection, General, Database Management, Information Technology and Systems, Web Search, General, Information Storage and Retrieval, Information Technology and Systems.

Full Text:

PDF

References

A. Blum, K. Ligett, and A. Roth, “A Learning Theory Approach to Non-Interactive Database Privacy,” Proc. 40th Ann. ACM Symp. Theory of Computing (STOC).

E. Adar, “User 4xxxxx9: Anonymizing Query Logs,” Proc. World Wide Web (WWW) Workshop Query Log Analysis, 2007.

R. Baeza-Yates, “Web Usage Mining in Search Engines,” Web Mining: Applications and Techniques, Idea Group, 2004.

R. Jones, B.Rey, O.Madani, and W.Greiner, “Generating Query substitutions,” Proc.15th Int‟l Conf.World Wide Web (WWW) 2006.

J.Han and M.Kamber, Data Mining: Concepts and Techniques, Sept. 2000.

R. Motwani and S.Nabar, “Anonymizing Unstructured Data,” corr,abs/0810.5582, 2008.

A. Machanavajjhala, D. Kifer, J.M. Abowd, J. Gehrke, and L. Vilhuber, “Privacy: Theory Meets Practice on the Map,” Proc. Int‟l Conf. Data Eng. (ICDE), 2008.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me