How to use keras RNN for text classification in a dataset?

Question

How to use keras RNN for text classification in a dataset?

6.8k views Asked by Eka At 25 December 2016 at 15:08

I have coded ANN classifiers using keras and now I am learning myself to code RNN in keras for text and time series prediction. After searching a while in web I found this tutorial by Jason Brownlee which is decent for a novice learner in RNN. The original article is using IMDb dataset for text classification with LSTM but because of its large dataset size I changed it to a small sms spam detection dataset.

# LSTM with dropout for sequence classification in the IMDB dataset
import numpy
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
import pandaas as pd
from sklearn.cross_validation import train_test_split

# fix random seed for reproducibility
numpy.random.seed(7)

url = 'https://raw.githubusercontent.com/justmarkham/pydata-dc-2016-tutorial/master/sms.tsv'
sms = pd.read_table(url, header=None, names=['label', 'message'])

# convert label to a numerical variable
sms['label_num'] = sms.label.map({'ham':0, 'spam':1})
X = sms.message
y = sms.label_num
print(X.shape)
print(y.shape)

# load the dataset 
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
top_words = 5000

# truncate and pad input sequences
max_review_length = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)

# create the model
embedding_vecor_length = 32
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length, dropout=0.2))
model.add(LSTM(100, dropout_W=0.2, dropout_U=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, nb_epoch=3, batch_size=64)

# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

I have successfully processed the dataset into training and testing set but now how should I model my RNN for this dataset?

Original Q&A

There are 2 answers

**Brock** · Answer 1 · 2017-02-25T08:38:20+00:00

Brock On 25 February 2017 at 08:38

If you are still stuck on this, check out this example by Jason Brownlee. Looks like you are most of the way there. You need to add an LSTM layer and a Dense layer to get a model that should work.

**gogs09** · Answer 2 · 2017-05-01T13:37:15+00:00

You need to represent raw text data as numeric vector before training a neural network model. For this, you can use CountVectorizer or TfidfVectorizer provided by scikit-learn. After converting from raw text format to numeric vector representation, you can train a RNN/LSTM/CNN for text classification problem.

TechQA.

How to use keras RNN for text classification in a dataset?

There are 2 answers

Related Questions in PYTHON

Related Questions in THEANO

Related Questions in KERAS

Related Questions in RECURRENT-NEURAL-NETWORK

Popular Questions

Popular Tags

Trending Questions