A Study on Privacy Preserving Data Publishing With Differential Privacy
Kar, Tonny Shekha 1990-
In the era of digitization it is important to preserve privacy of various sensitive information available around us, e.g., personal information, different social communication and video streaming sites' and services' own users' private information, salary information and structure of an organization, census and statistical data of a country and so on. These data can be represented in different formats such as Numerical and Categorical data, Graph Data, Tree-Structured data and so on. For preventing these data from being illegally exploited and protect it from privacy threats, it is required to apply an efficient privacy model over sensitive data. There have been a great number of studies on privacy-preserving data publishing over the last decades. Differential Privacy (DP) is one of the state of the art methods for preserving privacy to a database. However, applying DP to high dimensional tabular data (Numerical and Categorical) is challenging in terms of required time, memory, and high frequency computational unit. A well-known solution is to reduce the dimension of the given database, keeping its originality and preserving relations among all of its entities. In this thesis, we propose PrivFuzzy, a simple and flexible differentially private method that can publish differentially private data after reducing their original dimension with the help of Fuzzy logic. Exploiting Fuzzy mapping, PrivFuzzy can (1) reduce database columns and create a new low dimensional correlated database, (2) inject noise to each attribute to ensure differential privacy on newly created low dimensional database, and (3) sample each entry in the database and release synthesized database. Existing literatures show the difficulty of applying differential privacy over a high dimensional dataset, which we overcame by proposing a novel fuzzy based approach (PrivFuzzy). By applying our novel fuzzy mapping technique, PrivFuzzy transforms a high dimensional dataset to an equivalent low dimensional one, without losing any relationship within the dataset. Our experiments with real data and comparison with the existing privacy preserving models, PrivBayes and PrivGene, show that our proposed approach PrivFuzzy outperforms existing solutions in terms of the strength of privacy preservation, simplicity and improving utility. Preserving privacy of Graph structured data, at the time of making some of its part available, is still one of the major problems in preserving data privacy. Most of the present models had tried to solve this issue by coming up with complex solution, as well as mixed up with signal and noise, which make these solutions ineffective in real time use and practice. One of the state of the art solution is to apply differential privacy over the queries on graph data and its statistics. But the challenge to meet here is to reduce the error at the time of publishing the data as mechanism of Differential privacy adds a large amount of noise and introduces erroneous results which reduces the utility of data. In this thesis, we proposed an Expectation Maximization (EM) based novel differentially private model for graph dataset. By applying EM method iteratively in conjunction with Laplace mechanism our proposed private model applies differentially private noise over the result of several subgraph queries on a graph dataset. Besides, to ensure expected utility, by selecting a maximal noise level $\theta$, our proposed system can generate noisy result with expected utility. Comparing with existing models for several subgraph counting queries, we claim that our proposed model can generate much less noise than the existing models to achieve expected utility and can still preserve privacy.
DegreeMaster of Science (M.Sc.)
SupervisorRoy, Chanchal K.
CommitteeGreer, Jim; Stavness, Ian; Mamun, Abdullah
Copyright DateAugust 2017
Privacy Preserving Data Publishing