Features extraction in Real-time prediction in sagemaker

162 views Asked by At

i want to deploy a real time prediction machine learning model for fraud detection using sagemaker.

i used sagemaker jupyter instance to:

-load my training data from s3 contains transactions
-preprocessing data and features engineering (i use category_encoders to encode the categorical value)
-training the model and configure the endpoint

For the inference step , i used a lambda function which invoke my endpoint to get the prediction for each real time transaction.

should i calculte again all the features for this real time transactions in lambda function ?

for the features when i use category_encoders with fit_transform() function to transform my categorical feature to numerical one, what should I do because the result will not be the same as training set?

is there another method not to redo the calculation of the features in the inference step?
1

There are 1 answers

0
codez0mb1e On

should i calculate again all the features for this real time transaction in lambda function?

Yes, when inference a trained model (or predict on real-time data), you should pass exactly the same features list that you use for the training model. If you calculate some features while training (e.g. part of the day from timestamp) you should also calculate these features while inferencing.

for the features when i use category_encoders with fit_transform() function to transform my categorical feature to numerical one, what should I do because the result will not be the same as training set?

You should store all transformations that you use for training model: numeric scalers, categorical encoders, etc.

For python it looks like this:

import joblib # for dump fitted transformers
import category_encoders as ce

# 1. while training model
# fit encoder on historical data
encoder = ce.OneHotEncoder(cols=[...])
encoder.fit(X, y)
# and dump it
joblib.dump(encoder, 'filename.joblib') 

# 2. while inference a trained model
# load fitted encoder
encoder = joblib.load('filename.joblib')
# and apply transformation to new data
encoder.transform(X_new)