Data Mining issue with the apriori algorithm in C#

1.1k views Asked by At

I am creating my own implementation of the apriori algorithm in C#. For this project, i'm not allowed to use other libraries, etc. for the apriori algorithm.

Below is my testData.json. Please note that these are strings, meaning my itemsets might not just be a character like A, but a word like candy.

NOTE: I will be using a 20 (20%) support while testing.

{
    "transactions": [
        [ "B", "C" ],
        [ "B", "C", "D" ],
        [ "A", "D" ],
        [ "A", "B", "C", "D" ],
        [ "C", "D" ],
        [ "C", "D", "E" ],
        [ "A", "B" ]
    ]
}

When I click a button to process data along with my needed values, minSupport and minConfidence (not needed yet), I Deserialize my JSON into an Object and save it into a public variable called database Below is the Database class.

public class Database
{
    public List<List<string>> transactions { get; set; }
}

When the button is clicked, i call the method GenerateCandidateItemSet() This is where i am having a problem. :

private Dictionary<string, int> C1 = new Dictionary<string, int>();
private void GenerateCandidateItemSet()
{
    foreach (List<string> transaction in database.transactions)
    {
        foreach (string item in transaction)
        {
            if (C1.ContainsKey(item))
            {
                C1[item]++;
            }
            else
            {
                C1.Add(item, 1);
            }
        }
    }

    // Check our frequency, remove items with low support
    foreach (string key in C1.Keys.ToList())
    {
        double frequency = (C1[key] * 100) / (database.transactions.Count);
        if (frequency < minSupport)
        {
            C1.Remove(key);
        }
    }

    // Pairing check stuff
    List<string[]> itemPairs = new List<string[]>();
    List<string> items = C1.Keys.ToList();

    foreach (string item in items)
    {
        // FIX THIS LOOP LATER TO CONTAIN ALL PAIRS
        List<string> itemArray = new List<string>();
        if (item != items.Last())
        {
            itemArray.Add(item);
            itemArray.Add(items[items.IndexOf(item) + 1]);
            itemPairs.Add(itemArray.ToArray());
        }
    }
    GenerateItemSetRecursive(itemPairs);
}

Right before the section: //Pairing check stuff the value of C1 is:

enter image description here

When the loop completes, I need to get something like:

BC, BD, BA, CD, CA, DA

And if I was to plug in AB, AD, BC, BD, CD, the result would be ABD, BCD and so on.

Basically, I am needing to find the Frequent Itemsets for the transactions.

Question: Considering i'm only getting BC, CD, DA for my itemPairs, instead of BC, BD, BA, CD, CA, DA i know my logic is wrong. What would my loop look like to get this to work?

1

There are 1 answers

0
Jeremy On BEST ANSWER

As you point out, C1.Keys.ToList() gives you {"B", "C", "D", "A"}.

What your code is doing is iterates over that list and adds the next element to create a pair (assuming it's not the last element.

Step through your code - you'll see the first iteration give you {"B", "C"}, the next iteration give you {"C", "D"}, and the one after that give you {"D", "A"}. The last iteration will be for the last element of the list, so items.Last() will evaluate to true, and nothing will be added.

An easy way to make what you have work now is to add another loop inside of your broken loop. The intent would be that when you iterate for "B", you add not only {"B", "C"}, but also {"B", "D"} and {"B", "A"}, and similarly your outer iteration for "C" would find both {"C", "D"} and {"C", "A"}.

I hope this helps - feel free to ping me on C# chat if you're still having difficulty with this.