PRESERVING PRIVACY IN DATA RELEASE

Livraga, G.

doi:10.13130/livraga-giovanni_phd2014-03-18

Data sharing and dissemination play a key role in our information society. Not only do they prove to be advantageous to the involved parties, but they can also be fruitful to the society at large (e.g., new treatments for rare diseases can be discovered based on real clinical trials shared by hospitals and pharmaceutical companies). The advancements in the Information and Communication Technology (ICT) make the process of releasing a data collection simpler than ever. The availability of novel computing paradigms, such as data outsourcing and cloud computing, make scalable, reliable and fast infrastructures a dream come true at reasonable costs. As a natural consequence of this scenario, data owners often rely on external storage servers for releasing their data collections, thus delegating the burden of data storage and management to the service provider. Unfortunately, the price to be paid when releasing a collection of data is in terms of unprecedented privacy risks. Data collections often include sensitive information, not intended for disclosure, that should be properly protected. The problem of protecting privacy in data release has been under the attention of the research and development communities for a long time. However, the richness of released data, the large number of available sources, and the emerging outsourcing/cloud scenarios raise novel problems, not addressed by traditional approaches, which need enhanced solutions. In this thesis, we define a comprehensive approach for protecting sensitive information when large collections of data are publicly or selectively released by their owners. In a nutshell, this requires protecting data explicitly included in the release, as well as protecting information not explicitly released but that could be exposed by the release, and ensuring that access to released data be allowed only to authorized parties according to the data owners’ policies. More specifically, these three aspects translate to three requirements, addressed by this thesis, which can be summarized as follows. The first requirement is the protection of data explicitly included in a release. While intuitive, this requirement is complicated by the fact that privacy-enhancing techniques should not prevent recipients from performing legitimate analysis on the released data but, on the contrary, should ensure sufficient visibility over non sensitive information. We therefore propose a solution, based on a novel formulation of the fragmentation approach, that vertically fragments a data collection so to satisfy requirements for both information protection and visibility, and we complement it with an effective means for enriching the utility of the released data. The second requirement is the protection of data not explicitly included in a release. As a matter of fact, even a collection of non sensitive data might enable recipients to infer (possibly sensitive) information not explicitly disclosed but that somehow depends on the released information (e.g., the release of the treatment with which a patient is being cared can leak information about her disease). To address this requirement, starting from a real case study, we propose a solution for counteracting the inference of sensitive information that can be drawn observing peculiar value distributions in the released data collection. The third requirement is access control enforcement. Available solutions fall short for a variety of reasons. Traditional access control mechanisms are based on a reference monitor and do not fit outsourcing/cloud scenarios, since neither the data owner is willing, nor the cloud storage server is trusted, to enforce the access control policy. Recent solutions for access control enforcement in outsourcing scenarios assume outsourced data to be read-only and cannot easily manage (dynamic) write authorizations. We therefore propose an approach for efficiently supporting grant and revoke of write authorizations, building upon the selective encryption approach, and we also define a subscription-based authorization policy, to fit real-world scenarios where users pay for a service and access the resources made available during their subscriptions. The main contributions of this thesis can therefore be summarized as follows. With respect to the protection of data explicitly included in a release, our original results are: i) a novel modeling of the fragmentation problem; ii) an efficient technique for computing a fragmentation, based on reduced Ordered Binary Decision Diagrams (OBDDs) to formulate the conditions that a fragmentation must satisfy; iii) the computation of a minimal fragmentation not fragmenting data more than necessary, with the definition of both an exact and an heuristic algorithms, which provides faster computational time while well approximating the exact solutions; and iv) the definition of loose associations, a sanitized form of the sensitive associations broken by fragmentation that can be safely released, specifically extended to operate on arbitrary fragmentations. With respect to the protection of data not explicitly included in a release, our original results are: i) the definition of a novel and unresolved inference scenario, raised from a real case study where data items are incrementally released upon request; ii) the definition of several metrics to assess the inference exposure due to a data release, based upon the concepts of mutual information, Kullback-Leibler distance between distributions, Pearson’s cumulative statistic, and Dixon’s coefficient; and iii) the identification of a safe release with respect to the considered inference channel and the definition of the controls to be enforced to guarantee that no sensitive information be leaked releasing non sensitive data items. With respect to access control enforcement, our original results are: i) the management of dynamic write authorizations, by defining a solution based on selective encryption for efficiently and effectively supporting grant and revoke of write authorizations; ii) the definition of an effective technique to guarantee data integrity, so to allow the data owner and the users to verify that modifications to a resource have been produced only by authorized users; and iii) the modeling and enforcement of a subscription-based authorization policy, to support scenarios where both the set of users and the set of resources change frequently over time, and users’ authorizations are based on their subscriptions.

PRESERVING PRIVACY IN DATA RELEASE / G. Livraga ; tutor: P. Samarati, S. De Capitani di Vimercati, S. Foresti ; direttore scuola di dottorato: E. Damiani. DIPARTIMENTO DI INFORMATICA, 2014 Mar 18. 26. ciclo, Anno Accademico 2013. [10.13130/livraga-giovanni_phd2014-03-18].