Data mining of encrypted data in a database

1.1k views Asked by At

I am interested in doing a data mining website. Data in DB is really sensitive.

I would like to find a way to encrypt data in DB and to prove to my clients that even me, I can't read data.

The problem is that I would be able to "batch" rapports during night on the server side, and my software must be able to read data in clear.

Do you have an idea ?

4

There are 4 answers

0
Joshua Fox On

As mentioned by @vy32 Homomorphic Encryption provides the theoretical way to do this, but it is not practical today.

How about requesting anonymized rather than encrypted data?

For example, you don't need customer names or national IDs to tell them apart--anonymous IDs would do. Another example: Some data values can be hashed, so that you can tell different entities apart but not what they are. Number values could be given as an order, so that you know for every pair which is greater, rather than precise amounts. Fields that don't matter to you, like personal names in most applications, can simply be omitted.

There is an entire body of work devoted to anonymization, and another body of work devoted to de-anonymization of anonymized data sets, but you can get a long way with some simple transformations.

6
BuZz On

You should consider the most basic data encryption : RSA. Google this, it's straightforward, there are two keys to the encryption, one is the public key, the other is the private key. Let us know how that works out for you.

0
Paŭlo Ebermann On

There is no way that you can't decrypt the data, but your software can do it, as long as you have control over your software.

Somewhere needs to be a key so the software can decrypt the data, and if the software runs on a computer where you have access, you can get to the key. No way around this.

Your clients either have to trust you to not do anything malicious with the data, or they have to do the processing themselves (or with another service).

There might some ways to use homomorphic encryption (i.e. where you have enc(f1(a,b)) = f2(enc(a), enc(b)) for a pair of functions f1, f2), but this will only do for some very limited operations, encryption schemes specially made to support this, and quite likely not for stuff where your "data mining" is necessary.

0
vy32 On

You haven't described what you need done in terms of the reports. There are lots of approaches for doing computation on encrypted data. I suggest you start with these two approaches.

  1. Check out the book Translucent Databases 2nd Edition by Peter Wayner. The quote Wayner, " The book is still designed to help the world build databases that answer useful questions without keeping any useful information around. The examples show how most databases don't need to be filled with the world's secrets and personal information. If the client uses the right amount of encryption, the databases don't need to be dangerous one-stop shopping for the identity thieves and others who with malice aforethought."

  2. If you have a PhD in cryptography and you have a few billion cycles to burn, you should read up on Homomorphic Encryption.