i want to deploy a real time prediction machine learning model for fraud detection using sagemaker.
i used sagemaker jupyter instance to:
-load my training data from s3 contains transactions
-preprocessing data and features engineering (i use category_encoders to encode the categorical value)
-training the model and configure the endpoint
For the inference step , i used a lambda function which invoke my endpoint to get the prediction for each real time transaction.
should i calculte again all the features for this real time transactions in lambda function ?
for the features when i use category_encoders with fit_transform() function to transform my categorical feature to numerical one, what should I do because the result will not be the same as training set?
is there another method not to redo the calculation of the features in the inference step?
Yes, when inference a trained model (or predict on real-time data), you should pass exactly the same features list that you use for the training model. If you calculate some features while training (e.g.
part of the day
fromtimestamp
) you should also calculate these features while inferencing.You should store all transformations that you use for training model: numeric
scalers
, categoricalencoders
, etc.For python it looks like this: