System.ArgumentOutOfRangeException: 'Features column 'Feature' not found (Parameter 'schema')'

1.7k views Asked by At

I'm having a problem when training a model. I have a range of HTTP requests and I want to be able to identify is the request is coming from a bot or not. To train this I have a range of these:

public class Request
{
    public string Url { get; set; }
    public string UserAgent { get; set; }
    public bool IsBot { get; set; }
}

And a prediction class like this:

public class IsBotPrediction
{
    [ColumnName("PredictedLabel")]
    public bool Prediction { get; set; }
    public float Score { get; set; }
}

Just for this example, I have created a list of hardcoded data:

var trainingData = new List<Request>
{
    new Request { Url = "/wp-admin", UserAgent = "a bot", IsBot = true },
    new Request { Url = "/backoffice", UserAgent = "a bot", IsBot = true },
    new Request { Url = "/hack", UserAgent = "a bot", IsBot = true },
    new Request { Url = "/login", UserAgent = "a bot", IsBot = false },
    new Request { Url = "/dashboard", UserAgent = "a bot", IsBot = false },
    new Request { Url = "/humans.txt", UserAgent = "a bot", IsBot = false },
    new Request { Url = "/admin", UserAgent = "a bot", IsBot = true },
};

To train a model I'm using the following code:

IDataView mlData = mlContext.Data.LoadFromEnumerable(trainingData);

var dataPrepPipeline = mlContext
    .Transforms
    .Text
    .FeaturizeText("UrlF", "Url")
    .Append(mlContext.Transforms.Text.FeaturizeText("UserAgentF", "UserAgent"))
    .Append(mlContext.Transforms.Concatenate("Features", "UrlF", "UserAgentF"))
    .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
    .AppendCacheCheckpoint(mlContext);
var prepPipeline = dataPrepPipeline.Fit(mlData);

var trainer = mlContext
    .BinaryClassification
    .Trainers
    .AveragedPerceptron(labelColumnName: "IsBot", numberOfIterations: 10, featureColumnName: "Features");

var preprocessedData = prepPipeline.Transform(mlData);

ITransformer trainedModel = trainer.Fit(preprocessedData);

The trained model seems to be a success. But when I try to create a prediction engine:

var predEngine = mlContext.Model.CreatePredictionEngine<Request, IsBotPrediction>(trainedModel);

I get the following exception:

System.ArgumentOutOfRangeException: 'Features column 'Feature' not found (Parameter 'schema')'

Can you please help me figure out what this means?

1

There are 1 answers

5
Jon On BEST ANSWER

This may be due to transforming the data before it gets fitted into the model.

The below setup should work.

var dataPrepPipeline = mlContext.Transforms.Text.FeaturizeText("UrlF", "Url")
     .Append(mlContext.Transforms.Text.FeaturizeText("UserAgentF", "UserAgent"))
     .Append(mlContext.Transforms.Concatenate("Features", "UrlF", "UserAgentF"))
     .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
     .AppendCacheCheckpoint(mlContext);

var dataPrepModel = dataPrepPipeline.Fit(mlData);
var dataPrepDataView = dataPrepModel.Transform(mlData);

var pipeline = dataPrepPipeline.Append(
            mlContext.BinaryClassification.Trainers.AveragedPerceptron(labelColumnName: "IsBot", numberOfIterations: 10, featureColumnName: "Features"));

mlContext.Model.Save(dataPrepModel, dataPrepDataView.Schema, "./dataprep.zip");

var model = pipeline.Fit(mlData);

var modelDataView = model.Transform(mlData);

mlContext.Model.Save(model, modelDataView.Schema, "./model.zip");

var predEngine = mlContext.Model.CreatePredictionEngine<Request, IsBotPrediction>(model);