Secure and Private Data Outsourcing

When accessing data stored on remote servers, users unwittingly reveal personal information. An instance is the case of Video on Demand services, supporting a wide range of content options and enabling subscribers to select, retrieve and locally consume desired content. Information on user browsing and consumption habits can be used by the service providers to infer user interests and improve content and ad placement techniques. Users however have no control over who can access and the type of processing that can be performed on this sensitive data, which on multiple occasions has been sold or even made public (e.g., Netflix, AOL).

The focus of this project is on providing solutions that allow users to efficiently access remote content and services without fear of unwanted side-effects, e.g., leaking personal information which can lead to further behavioral profiling, spam and targeted ad placement. In the following, several contributions are summarized. Computational private information retrieval (cPIR) techniques allow users to access remote data without leaking their interests or their access patterns to the server storing the data. While in principle cPIR can be used to solve the above problem, in the paper below we show that the deployment of non-trivial single server cPIR protocols on real hardware of the recent past would have been orders of magnitude less time-efficient than trivially transferring the entire database. We have validated this conclusion in an experimental setup on modern off-the-shelf hardware and also shown that given the predicted evolution trends for CPU and bandwidth it is likely to hold on non-specialized hardware even in the foreseeable future.

"On the Computational Practicality of Private Information Retrieval".
Radu Sion, Bogdan Carbunar.
In Proceedings of the 14th ISOC Network and Distributed Systems Security Symposium (NDSS), San Diego, February-March 2007 [15%] [pdf]

To address the above challenge, in the following paper we designed a Bloom filter based construction that reduces the amortized complexity of an Oblvious RAM structure to O(\lg n \lg \lg n) server computation overhead, in the presence of O(\sqrt n) client working memory. The proposed solution retains the pyramid-shaped database layout and reshuffling schedule employed by an Oblivious RAM (ORAM) structure but stores the database in a series of hash tables maintained by the server. Using a series of encrypted Bloom filters to query the location of a data item, the user can retrieve the item from the appropriate hash table. The use of encrypted Bloom filters allows the user to query an item directly at each level without revealing the success. In addition to reducing the time complexity, this approach reduces the amount of remote of storage required, from O(n \lg n) to O(n): On 1TB of data, the throughput is increased by nearly an order of magnitude, and the storage required is reduced by over an order of magnitude.

"Building Castles out of Mud: Access Pattern Privacy and Correctness on Untrusted Storage"
Peter Williams, Radu Sion, Bogdan Carbunar.
In Proceedings of the 15th ACM Conference on Computer and Communications Security (CCS)[18%], Alexandria, October 2008 [pdf]

Oblivious RAMs allow users to both read and modify the data stored on the server. This property is essential when clients outsource their data (e-mail, documents pictures) to remote, ``cloud'', servers. When the users of the system are corporations, an important obstacle remains however: providing assured lifecycle storage of records. Over 10,000 regulations are believed to govern the management of information in the US alone, including the Gramm-Leach-Bliley Act, the Sarbanes-Oxley Act, or the Health Insurance Portability and Accountability Act (HIPAA). Their main goal is to support Write Once Read Many (WORM) semantics: once written, data cannot be undetectably altered or deleted before the end of its regulation-mandated life span. To balance the two requirements, in the following paper we introduce WORM-ORAM, a first mechanism that combines Oblivious RAM access privacy and data confidentiality with WORM regulatory data retention guarantees. Clients can outsource their database to a server with full confidentiality and data access privacy, and, for data retention, the server ensures client access WORM semantics. WORM-ORAM is built on a set of novel efficient zero knowledge (ZK) proofs: The client is allowed unfettered ORAM access with full privacy to the server-hosted encrypted data set while simultaneously proving to the server in zero-knowledge that no existing records are overwritten and WORM semantics are preserved.

"Regulatory Compliant Oblivious RAM"
Bogdan Carbunar, Radu Sion.
In Proceedings of the 8th International Conference on Applied Cryptography and Network Security (ACNS)[18.5%], Beijing, June 2010 [pdf]

So far we have focused on a model where the client is also the data owner. An interesting research problem arises however when we consider the case of a data owner that needs the flexibility to delegate data access to third party clients. In the following paper we consider such a scenario: multiple clients want to share data on a server, while hiding all access patterns. We propose here a first solution to this problem based on Oblivious RAM (ORAM) techniques. Data owners can delegate rights to external new clients enabling them to privately access portions of the outsourced data served by a curious server. Our solution is as efficient as the underlying ORAM constructs and allows for delegated read or write access while ensuring strong guarantees for the privacy of the outsourced data. The server does not learn anything about client access patterns while clients do not learn anything more than what their delegated rights permit.

"Oblivious Outsourced Storage with Delegation"
Martin Franz, Bogdan Carbunar, Radu Sion, Stefan Katzenbeisser, Miroslava Sotakova, Peter Williams and Andreas Peter.
To appear in the 15th Financial Cryptography and Data Security (FC), St. Lucia, March 2011

The work described above has focused on a straightforward communication model. An interesting question is whether outsourcing issues occur also in other networked scenarios. In the following paper we propose a wireless sensor network outsourcing model: Network deployers open the network to interested clients that would like to query specific locations without leaking their interests. The leaks are not only in terms of the regions of interest but also in terms of client access patterns. We first propose an efficient protocol, SPYC, that ensures full client privacy in settings where the servers providing access to the network are honest-but-curious and whose collaboration does not extend beyond well-defined administrative purposes. Furthermore, we study the same query privacy problem in a setting where servers exhibit malicious behavior or where powerful external attackers have access to sensor network traffic information. In this setting we propose two metrics for quantifying the privacy achieved by a client's query sequence. We then extend SPYC with a suite of practical algorithms, then analyze the privacy and efficiency levels they provide. Our TOSSIM simulations show that the proposed extensions are communication efficient while significantly improving client privacy levels.

"Query Privacy in Wireless Sensor Networks"
Bogdan Carbunar, Yang Yu, Larry Shi, Michael Pearce, Venu Vasudevan.
ACM Transactions on Sensor Networks (TOSN), Volume 6, Issue 2, May 2010 [pdf]

In an outsourced database framework, clients place data management with specialized service providers. In the following paper we investigate privacy for operations more complex than read and write accesses. Specifically, we introduce a mechanism for executing JOIN operations in an outsourced relational database framework with full computational privacy and low overheads -- a first, to the best of our knowledge. We illustrate via a set of relevant instances of JOIN predicates, including: range and equality (e.g., for geographical data), Hamming distance (e.g., for DNA matching) and semantics (i.e., in health-care scenarios -- mapping antibiotics to bacteria). We experimentally evaluate the main overhead components and show they are reasonable. For example, the initial client computation overhead for 100000 data items is around 5 minutes. Moreover, our privacy mechanisms can sustain theoretical throughputs of over 30 million predicate evaluations per second, even for an un-optimized OpenSSL based implementation.

"Joining Privately on Outsourced Data"
Bogdan Carbunar, Radu Sion.
In Proceedings of the 7th VLDB Workshop on Secure Data Management (SDM), Singapore, September 2010 [pdf]