I've implemented a Keras custom DataGenerator, that from a pairs of nested tuples of the form (files, test), generates positive and negative examples of data.
Data Example:
[((0, 1, 2), 0),
((3, 4, 5), 0),
((12,), 1),
((0, 1, 4, 7), 1)]
Batch Example:
{'files': (0, 1, 2), 'test': 0}, label=1
where label is 1 for positive and 0 for negative examples in data.
I have the following function to generate the data:
def data_generation(self, pairs):
"""Generate batches of samples for training"""
batch = np.zeros((self.batch_size, 3)) # I KNOW THE PROBLEM CAN BE HERE
# Adjust label based on task
if self.classification:
neg_label = 0
else:
neg_label = -1
# This creates a generator
while True:
for idx, (file_id, test_id) in enumerate(random.sample(pairs, self.n_positive)):
batch[idx, :] = (file_id, test_id, 1)
# Increment idx by 1
idx += 1
# Add negative examples until reach batch size
while idx < self.batch_size:
# random selection
random_test = random.randrange(self.nr_tests)
# Check to make sure this is not a positive example
if (file_id, random_test) not in self.pairs_set:
# Add to batch and increment index
batch[idx, :] = (file_id, random_test, neg_label)
idx += 1
np.random.shuffle(batch)
yield {'file': batch[:, 0], 'test': batch[:, 1]}, batch[:, 2]
Traceback:
File "/Users/DataGenerator.py", line 83, in data_generation
batch[idx, :] = (file_id, test_id, 1)
ValueError: setting an array element with a sequence.
Now, I know batch is the problem, because it is of the form (tuple, int, int) and moreover with variable length. Should I mask or pad the tuple ? How do I make this work?