i want to deploy a real time prediction machine learning model for fraud detection using sagemaker.
i used sagemaker jupyter instance to:
-load my training data from s3 contains transactions
-preprocessing data and features engineering (i use category_encoders to encode the categorical value)
-training the model and configure the endpoint
For the inference step , i used a lambda function which invoke my endpoint to get the prediction for each real time transaction.
should i calculte again all the features for this real time transactions in lambda function ?
for the features when i use category_encoders with fit_transform() function to transform my categorical feature to numerical one, what should I do because the result will not be the same as training set?
is there another method not to redo the calculation of the features in the inference step?
Yes, when inference a trained model (or predict on real-time data), you should pass exactly the same features list that you use for the training model. If you calculate some features while training (e.g.
part of the dayfromtimestamp) you should also calculate these features while inferencing.You should store all transformations that you use for training model: numeric
scalers, categoricalencoders, etc.For python it looks like this: