XGBoost's precision and recall

215 views Asked by At

I'm working on SecureXGBoost by applying pailliar encryption on XGBoost. At the same time, before SecureXGBoost, I'm worked on SecureSVM and I calcute precision and recall. The goal of this study is privacy-preserving XGBoost training alorithm.

Although I use privacy-preserving-xgboost-inference to apply the privacy-preserving idea, but I cann't calcute the precision and recall of encrypted data.

After I run the code of GitHub, I run these, but how can i solve my problem (calcute the precision and recall of encrypted data):

import sys
sys.path.append('../third-party')

import pandas as pd
import numpy as np
import xgboost as xgb
from secrets import token_bytes

from sklearn.model_selection import train_test_split

from ppxgboost import BoosterParser as boostparser
from ppxgboost import PPBooster as ppbooster
from ppxgboost.PPBooster import MetaData
from ppxgboost.PPKey import PPBoostKey
from ope.pyope.ope import OPE
from ppxgboost import PaillierAPI as paillier

then load train data of heart desiaes:

df = pd.read_csv('heart.csv')
train = df.loc[:900] # trainig set
# creating dependent and independent matrix of features
X = train.iloc[:, :-1]
y = train.iloc[:, -1]

So I split the data to train and test data:

x_train,x_test,y_train,y_test = train_test_split(X,y, test_size = 0.20, random_state = 31)

Now I train a xgboost model:

# Train a xgboost model 
dtrain = xgb.DMatrix(x_train, label=y_train)
params = {'eta': 0.1}
model = xgb.train(params=params, dtrain=dtrain)

# predict using the plaintext prediction
plaintext_predict = model.predict(xgb.DMatrix(x_test))

After that i dump the tree model:

model.dump_model('tree.txt')

Then follow the this road map:

Encryption Preparation for XGBoost Model

  1. Set up some metadata information for the dataset.
  2. Set up the encryption materials
  3. Encrypt the model
  4. Encrypt the query
  5. Perform the prediction

1.parsing to internal tree data structure, and output feature set

min_max = boostparser.training_dataset_parser(X_test)
enc_tree, feature_set, min_max = boostparser.model_to_trees(model, min_max)

2.Set up encryption materials.

prf_key = token_bytes(16)
public_key, private_key = paillier.he_key_gen()
encrypter = OPE(token_bytes(16))
ppBoostKey = PPBoostKey(public_key, prf_key, encrypter)

3.process the tree into enc_tree

ppbooster.enc_xgboost_model(ppBoostKey, enc_tree, MetaData(min_max))

4.Encrypts the input vector for prediction (using

prf_key_hash and ope-encrypter) based on the feature set.
ppbooster.enc_input_vector(prf_key, encrypter, feature_set, X_test, MetaData(min_max))

5.privacy-preserving evaluation.

import time
start = time.time()
values = ppbooster.predict_binary(enc_tree, x_test)
end = time.time()
print("Elapsed Time: ", end - start)

The values is represent the encrypted prediction; So how can I calcute precision and recall on the encrypted data without decrypt it?

I need to calcute the mathematic evaluation on encrypted data which pailliar encryption algorihm is the appropriate selection.

0

There are 0 answers