I was looking throw the tensorflow contrib API and I wanted to use the RNNClassifier available with Tensorflow 1.13. Contrary to non sequence estimators, this one needs sequence feature columns only. However I was not able to make it work on a toy dataset. I keep getting an error while using sequence_numeric_column.

Here is the structure of my toy dataset:

idSeq,kind,label,size
0,0,dwarf,117.6
0,0,dwarf,134.4
0,0,dwarf,119.0
0,1,human,168.0
0,1,human,145.25
0,2,elve,153.9
0,2,elve,218.49999999999997
0,2,elve,210.9
1,0,dwarf,166.6
1,0,dwarf,168.0
1,0,dwarf,131.6
1,1,human,150.5
1,1,human,208.25
1,1,human,210.0
1,2,elve,199.5
1,2,elve,161.5
1,2,elve,197.6

where idSeq allow us to see which rows belong to which sequence. I want to predict the "kind" column thanks to the "size" column.

Below there is my code about make my RNN training on my dataset.

import numpy as np
import pandas as pd
import tensorflow as tf


os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
tf.logging.set_verbosity(tf.logging.INFO)

dataframe = pd.read_csv("data_rnn.csv")
dataframe_test = pd.read_csv("data_rnn_test.csv")


train_x = dataframe
train_y = dataframe.loc[:,(["kind"])]


size_feature_col = tf.contrib.feature_column.sequence_numeric_column('size ')


estimator = tf.contrib.estimator.RNNClassifier(
    sequence_feature_columns=[size_feature_col ],
    num_units=[32, 16],
    cell_type='lstm',
    model_dir=None,
    n_classes=3,
    optimizer='Adagrad'
)



def make_dataset(
    batch_size, 
    x, 
    y=None, 
    shuffle=False, 
    shuffle_buffer_size=1000,
    shuffle_seed=1):
    """
    An input function for training, evaluation or prediction.

    Parameters
    ----------------------
    batch_size: integer
        the size of the batch to use for the training of the neural network
    x: pandas dataframe 
        dataframe that contains the features of the samples to study
    y: pandas dataframe or array (Default: None)
        dataframe or array that contains the values to predict of the samples
        to study. If none, we want a dataset for evaluation or prediction.
    shuffle: boolean (Default: False)
        if True, we shuffle the elements of the dataset
    shuffle_buffer_size: integer (Default: 1000)
        if we shuffle the elements of the dataset, it is the size of the buffer
        used for it.
    shuffle_seed : integer
        the random seed for the shuffling

    Returns
    ---------------------
    dataset.make_one_shot_iterator().get_next(): Tensor
        a nested structure of tf.Tensors containing the next element of the 
        dataset to study
    """

    def input_fn():
        if y is not None:
            dataset = tf.data.Dataset.from_tensor_slices((dict(x), y))
        else:
            dataset = tf.data.Dataset.from_tensor_slices(dict(x))
        if shuffle:
            dataset = dataset.shuffle(
                buffer_size=shuffle_buffer_size,
                seed=shuffle_seed).batch(batch_size).repeat()
        else:
            dataset = dataset.batch(batch_size)
        return dataset.make_one_shot_iterator().get_next()

    return input_fn



batch_size = 50
random_seed = 1


input_fn_train = make_dataset(
            batch_size=batch_size, 
            x=train_x, 
            y=train_y, 
            shuffle=True, 
            shuffle_buffer_size=len(train_x),
            shuffle_seed=random_seed)

estimator.train(input_fn=input_fn_train, steps=5000)

But I only got the following error :

INFO:tensorflow:Calling model_fn.
Traceback (most recent call last):
  File "main.py", line 125, in <module>
    estimator.train(input_fn=input_fn_train, steps=5000)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/contrib/estimator/python/estimator/rnn.py", line 512, in _model_fn
    config=config)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/contrib/estimator/python/estimator/rnn.py", line 332, in _rnn_model_fn
    logits, sequence_length_mask = logit_fn(features=features, mode=mode)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/contrib/estimator/python/estimator/rnn.py", line 226, in rnn_logit_fn
    features=features, feature_columns=sequence_feature_columns)
  File "/root/.local/lib/python3.5/site-packages/tensorflow/contrib/feature_column/python/feature_column/sequence_feature_column.py", line 120, in sequence_input_layer
    trainable=trainable)
  File "/root/.local/lib/python3.5/site-packages/tensorflow/contrib/feature_column/python/feature_column/sequence_feature_column.py", line 496, in _get_sequence_dense_tensor
    sp_tensor, default_value=self.default_value)
  File "/root/.local/lib/python3.5/site-packages/tensorflow/python/ops/sparse_ops.py", line 1432, in sparse_tensor_to_dense
    sp_input = _convert_to_sparse_tensor(sp_input)
  File "/root/.local/lib/python3.5/site-packages/tensorflow/python/ops/sparse_ops.py", line 68, in _convert_to_sparse_tensor
    raise TypeError("Input must be a SparseTensor.")
TypeError: Input must be a SparseTensor.

So I don't understand what I've done wrong because on the documentation, it is written that we have to give a sequence column to the RNNEstimator. They do not say anything about giving sparse tensor.

Thanks in advance for your help and advices.

0 Answers