I am creating my own implementation of the apriori algorithm in C#. For this project, i'm not allowed to use other libraries, etc. for the apriori algorithm.
Below is my testData.json
. Please note that these are strings, meaning my itemsets might not just be a character like A
, but a word like candy
.
NOTE: I will be using a 20
(20%) support while testing.
{
"transactions": [
[ "B", "C" ],
[ "B", "C", "D" ],
[ "A", "D" ],
[ "A", "B", "C", "D" ],
[ "C", "D" ],
[ "C", "D", "E" ],
[ "A", "B" ]
]
}
When I click a button to process data along with my needed values, minSupport
and minConfidence
(not needed yet), I Deserialize my JSON into an Object and save it into a public variable called database
Below is the Database class.
public class Database
{
public List<List<string>> transactions { get; set; }
}
When the button is clicked, i call the method GenerateCandidateItemSet()
This is where i am having a problem. :
private Dictionary<string, int> C1 = new Dictionary<string, int>();
private void GenerateCandidateItemSet()
{
foreach (List<string> transaction in database.transactions)
{
foreach (string item in transaction)
{
if (C1.ContainsKey(item))
{
C1[item]++;
}
else
{
C1.Add(item, 1);
}
}
}
// Check our frequency, remove items with low support
foreach (string key in C1.Keys.ToList())
{
double frequency = (C1[key] * 100) / (database.transactions.Count);
if (frequency < minSupport)
{
C1.Remove(key);
}
}
// Pairing check stuff
List<string[]> itemPairs = new List<string[]>();
List<string> items = C1.Keys.ToList();
foreach (string item in items)
{
// FIX THIS LOOP LATER TO CONTAIN ALL PAIRS
List<string> itemArray = new List<string>();
if (item != items.Last())
{
itemArray.Add(item);
itemArray.Add(items[items.IndexOf(item) + 1]);
itemPairs.Add(itemArray.ToArray());
}
}
GenerateItemSetRecursive(itemPairs);
}
Right before the section: //Pairing check stuff
the value of C1
is:
When the loop completes, I need to get something like:
BC, BD, BA, CD, CA, DA
And if I was to plug in AB, AD, BC, BD, CD
, the result would be ABD, BCD
and so on.
Basically, I am needing to find the Frequent Itemsets
for the transactions.
Question: Considering i'm only getting BC, CD, DA
for my itemPairs, instead of BC, BD, BA, CD, CA, DA
i know my logic is wrong. What would my loop look like to get this to work?
As you point out,
C1.Keys.ToList()
gives you{"B", "C", "D", "A"}
.What your code is doing is iterates over that list and adds the next element to create a pair (assuming it's not the last element.
Step through your code - you'll see the first iteration give you
{"B", "C"}
, the next iteration give you{"C", "D"}
, and the one after that give you{"D", "A"}
. The last iteration will be for the last element of the list, soitems.Last()
will evaluate to true, and nothing will be added.An easy way to make what you have work now is to add another loop inside of your broken loop. The intent would be that when you iterate for
"B"
, you add not only{"B", "C"}
, but also{"B", "D"}
and{"B", "A"}
, and similarly your outer iteration for"C"
would find both{"C", "D"}
and{"C", "A"}
.I hope this helps - feel free to ping me on C# chat if you're still having difficulty with this.