Naive Bayes Ticket Classification Python

894 views Asked by At

I currently have a CSV export from our ticketing system with two columns.

Short Description and Class.

Both are created by the agent when logging a ticket. eg

  • Data Backup is not working,Backup
  • Email change in Groups,Notes
  • backup directory not found,Backup
  • Email > Global - Lotus Notes,Notes

I have been asked to write a Naive Bayes program using Python that will read the short description in a CSV file and then decide how it should be classified.

I have 329 tickets that have been classified into 6 different classes.

The following is a count of each:

  • Class1 60
  • Class2 77
  • Class3 65
  • Class4 16
  • Class5 18
  • Class6 93

I was thinking I would have to create 6 different dictionaries (one for each class) containing all the words used in the short description, excluding the usual !"£$%^&*()<>,./?:;@'#~][{}

Then when I run the program it will tokenize the short description using nltk and compare it to all the dictionaries and whatever one has the highest matches will determine the class.

Am I going about this the right way? How many tickets should I be using for my sample?

The following is what I have at the moment. It basically runs through a csv file named after a class and then outputs another file with punctuation removed, all the words in lower case and in separate cells. This data will then be used as a dictionary. I'm not sure if I'm going about this whole thing the right way though.

import csv
from nltk.tokenize import RegexpTokenizer 

#Read CSV
readFile = open ('Backup.csv', 'r')
csv.readFile = csv.reader(readFile)

resultFile = open ('result.csv', 'w')
wr = csv.writer(resultFile)

#removes punctuation
tokenizer = RegexpTokenizer(r'\w+')

#for every row in file tokenize and covert to lowercase
#write tokenized words to a .csv file.
for row in csv.readFile:
    wr.writerow(tokenizer.tokenize(row[0].lower()))

readFile.close()
resultFile.close()

EDIT: I have now started using the following which takes in the data from my two column csv file:

from textblob.classifiers import NaiveBayesClassifier
from textblob import TextBlob

with open('train.csv', 'r') as fp:
     cl = NaiveBayesClassifier(fp, format="csv")

print(cl.classify("backup"))  # "Backup"
print(cl.classify("Lotus Notes."))   #"Lotus"
etc..

Pretty sure I just need to get a better sample size of training and test data and then I will feed in a csv of short descriptions then update it with the class that has been calculated.

From a functionality point of view it seems to work unless I've made any glaring mistakes?

0

There are 0 answers