I'm having a problem when training a model. I have a range of HTTP requests and I want to be able to identify is the request is coming from a bot or not. To train this I have a range of these:
public class Request
{
public string Url { get; set; }
public string UserAgent { get; set; }
public bool IsBot { get; set; }
}
And a prediction class like this:
public class IsBotPrediction
{
[ColumnName("PredictedLabel")]
public bool Prediction { get; set; }
public float Score { get; set; }
}
Just for this example, I have created a list of hardcoded data:
var trainingData = new List<Request>
{
new Request { Url = "/wp-admin", UserAgent = "a bot", IsBot = true },
new Request { Url = "/backoffice", UserAgent = "a bot", IsBot = true },
new Request { Url = "/hack", UserAgent = "a bot", IsBot = true },
new Request { Url = "/login", UserAgent = "a bot", IsBot = false },
new Request { Url = "/dashboard", UserAgent = "a bot", IsBot = false },
new Request { Url = "/humans.txt", UserAgent = "a bot", IsBot = false },
new Request { Url = "/admin", UserAgent = "a bot", IsBot = true },
};
To train a model I'm using the following code:
IDataView mlData = mlContext.Data.LoadFromEnumerable(trainingData);
var dataPrepPipeline = mlContext
.Transforms
.Text
.FeaturizeText("UrlF", "Url")
.Append(mlContext.Transforms.Text.FeaturizeText("UserAgentF", "UserAgent"))
.Append(mlContext.Transforms.Concatenate("Features", "UrlF", "UserAgentF"))
.Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
.AppendCacheCheckpoint(mlContext);
var prepPipeline = dataPrepPipeline.Fit(mlData);
var trainer = mlContext
.BinaryClassification
.Trainers
.AveragedPerceptron(labelColumnName: "IsBot", numberOfIterations: 10, featureColumnName: "Features");
var preprocessedData = prepPipeline.Transform(mlData);
ITransformer trainedModel = trainer.Fit(preprocessedData);
The trained model seems to be a success. But when I try to create a prediction engine:
var predEngine = mlContext.Model.CreatePredictionEngine<Request, IsBotPrediction>(trainedModel);
I get the following exception:
System.ArgumentOutOfRangeException: 'Features column 'Feature' not found (Parameter 'schema')'
Can you please help me figure out what this means?
This may be due to transforming the data before it gets fitted into the model.
The below setup should work.