ML.NET for Predicting Next Words with Related Word

157 views Asked by At

I'm working on a project using ML.NET to predict the next word in a sentence, and I've used n-gram and bagofwords techniques to create my transformed data. However, I'm facing a couple of issues with the prediction engine: The prediction engine provides binary options (either one or zero) instead of suggesting the next word with decimal values, which makes it less informative. My dataset contains words like "work," "worked," and "working." When I search for "work," the prediction engine only matches it with the exact same word and doesn't consider related words. Here's a snippet of my code:

using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Threading.Tasks;

namespace Test_intelesence
{
    internal class Program
    {
        static void Main(string[] args)
        {
            
           
            var mlContext = new MLContext();

            // Load and preprocess your training data
            var dataView = mlContext.Data.LoadFromTextFile<TextData>("Data.txt");

            // Define the data preparation pipeline
            var pipeline = mlContext.Transforms.Text.TokenizeIntoWords("Words", "Text")
                .Append(mlContext.Transforms.Conversion.MapValueToKey("Words"))
                .Append(mlContext.Transforms.Text.FeaturizeText("Features", "Words"))
                .Append(mlContext.Transforms.Conversion.MapValueToKey("Label"));

            // Fit the pipeline to your data
            var transformer = pipeline.Fit(dataView);
            var transformedData = transformer.Transform(dataView);

            // Create and train an n-gram model
            var trainer = mlContext.BinaryClassification.Trainers.AveragedPerceptron();
            var model = trainer.Fit(transformedData);

            // Define a context to predict the next word
            var context = new TextData
            {
                Text = "work"
            };

            // Make a prediction for the next word
            var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData, Prediction>(model);
            var prediction = predictionEngine.Predict(context);

            Console.WriteLine($"Predicted Next Word: {prediction.PredictedWord}");
        }
    }


    public class TextData
    {
        [LoadColumn(0)]
        public string Text;
    }

    // Define a prediction class
    public class Prediction
    {
        public string PredictedWord;
    }
}

I would like to know how I can address these issues and improve the accuracy of the prediction engine, especially when dealing with related words like "work," "worked," and "working." Any suggestions, code examples, or guidance would be highly appreciated.

0

There are 0 answers