Naive Bayes Ticket Classification Python

895 views Asked by davethebear10 At 25 June 2015 at 14:08

I currently have a CSV export from our ticketing system with two columns.

Short Description and Class.

Both are created by the agent when logging a ticket. eg

Data Backup is not working,Backup
Email change in Groups,Notes
backup directory not found,Backup
Email > Global - Lotus Notes,Notes

I have been asked to write a Naive Bayes program using Python that will read the short description in a CSV file and then decide how it should be classified.

I have 329 tickets that have been classified into 6 different classes.

The following is a count of each:

Class1 60
Class2 77
Class3 65
Class4 16
Class5 18
Class6 93

I was thinking I would have to create 6 different dictionaries (one for each class) containing all the words used in the short description, excluding the usual !"£$%^&*()<>,./?:;@'#~][{}

Then when I run the program it will tokenize the short description using nltk and compare it to all the dictionaries and whatever one has the highest matches will determine the class.

Am I going about this the right way? How many tickets should I be using for my sample?

The following is what I have at the moment. It basically runs through a csv file named after a class and then outputs another file with punctuation removed, all the words in lower case and in separate cells. This data will then be used as a dictionary. I'm not sure if I'm going about this whole thing the right way though.

import csv
from nltk.tokenize import RegexpTokenizer 

#Read CSV
readFile = open ('Backup.csv', 'r')
csv.readFile = csv.reader(readFile)

resultFile = open ('result.csv', 'w')
wr = csv.writer(resultFile)

#removes punctuation
tokenizer = RegexpTokenizer(r'\w+')

#for every row in file tokenize and covert to lowercase
#write tokenized words to a .csv file.
for row in csv.readFile:
    wr.writerow(tokenizer.tokenize(row[0].lower()))

readFile.close()
resultFile.close()

EDIT: I have now started using the following which takes in the data from my two column csv file:

from textblob.classifiers import NaiveBayesClassifier
from textblob import TextBlob

with open('train.csv', 'r') as fp:
     cl = NaiveBayesClassifier(fp, format="csv")

print(cl.classify("backup"))  # "Backup"
print(cl.classify("Lotus Notes."))   #"Lotus"
etc..

Pretty sure I just need to get a better sample size of training and test data and then I will feed in a csv of short descriptions then update it with the class that has been calculated.

From a functionality point of view it seems to work unless I've made any glaring mistakes?

Original Q&A

TechQA.

Naive Bayes Ticket Classification Python

There are 0 answers

Related Questions in PYTHON

Related Questions in EXCEL

Related Questions in CSV

Related Questions in NLTK

Related Questions in NAIVEBAYES

Popular Questions

Popular Tags

Trending Questions