I am doing research on Fully homomorphic encryption. Since only fully homomorphic encryption allows to perform computation on encrypted data and this mechanism provides by the PySeal library which is a python fork version of the Microsoft SEAL library. I have 3 columns in my data frame. I want to encrypt each value of every column using PySeal that I can do computation on those value.
df
| SNP | ID | Effect|
|:---- |:------:| -----:|
| 21515| 1 | 0.5 |
| 21256| 2 | 0.7 |
| 21286| 3 | 1.7 |
related documents of PySeal: https://github.com/Lab41/PySEAL/blob/master/SEALPythonExamples/examples.py
Interesting question, I can help you with using the library with pandas but not with setting secure encryption parameters like the moduli.
First let's do some imports:
Now we set the encryption parameters. I do not know enough to advise you on how to set these correctly, but getting the values correct is important to achieve proper security. A quote from the documentation:
Next we'll setup keys, encoders, crypters and decrypters.
Lets setup some handy functions we will use with DataFrames to encrypt and decrypt.
Finally we'll define a multiplication operation on integers that we can use with pandas. To keep this answer short we won't demonstrate an operation on floating point numbers but it shouldn't be hard to make one.
Note that Evaluator.multiple is an inplace operation so when we use it with a DataFrame it will mutate the values inside!
Now let's put it all to work:
This prints your example:
Now let's make an encrypted dataframe:
Prints:
Encrypted Values:
Which is just a bunch of objects in DataFrame.
Now let's do an operation.
You won't notice a difference in values printed at this point because all we did was mutate the objects in the dataframe, so it will just print the same memory references.
Now let's decrypt to see the results:
This prints:
Which is the result you'd expect multiplying the integer columns by two.
To use this practically you would have to serialise the encrypted dataframe before sending into to the other party to be worked on and then returned to you to be decrypted. The library forces you to use pickle to do this. This is unfortunate from a security point of view since you should never unpickle untrusted data. Can the server trust the client not to put anything nasty in the pickle serialisation and can the client trust that server won't do the same when it returns answer? In general the answer to both would be no, more-so here since the client already doesn't trust the server, otherwise it would not be using homomorphic encryption! Clearly these python bindings are more of a tech-demonstrator, but I thought it was worth pointing out this limitation.
There are batch operations in the library, which I have not demonstrated. These may make more sense to use in the context of DataFrames, since they should have better performance for operations over many values.