How to generate batches of data from list of nested tuples?

408 views Asked by At

I've implemented a Keras custom DataGenerator, that from a pairs of nested tuples of the form (files, test), generates positive and negative examples of data.

Data Example:

 [((0, 1, 2), 0), 
  ((3, 4, 5), 0), 
  ((12,), 1), 
  ((0, 1, 4, 7), 1)] 

Batch Example:

{'files': (0, 1, 2), 'test': 0}, label=1

where label is 1 for positive and 0 for negative examples in data.

I have the following function to generate the data:

 def data_generation(self, pairs):
    """Generate batches of samples for training"""
    batch = np.zeros((self.batch_size, 3)) # I KNOW THE PROBLEM CAN BE HERE

    # Adjust label based on task
    if self.classification:
        neg_label = 0
    else:
        neg_label = -1

    # This creates a generator
    while True:
        for idx, (file_id, test_id) in enumerate(random.sample(pairs, self.n_positive)):
            batch[idx, :] = (file_id, test_id, 1)

        # Increment idx by 1
        idx += 1

        # Add negative examples until reach batch size
        while idx < self.batch_size:

            # random selection
            random_test = random.randrange(self.nr_tests)

            # Check to make sure this is not a positive example
            if (file_id, random_test) not in self.pairs_set:
                # Add to batch and increment index
                batch[idx, :] = (file_id, random_test, neg_label)
                idx += 1

        np.random.shuffle(batch)
        yield {'file': batch[:, 0], 'test': batch[:, 1]}, batch[:, 2]
Traceback:
    File "/Users/DataGenerator.py", line 83, in data_generation
        batch[idx, :] = (file_id, test_id, 1)
    ValueError: setting an array element with a sequence.

Now, I know batch is the problem, because it is of the form (tuple, int, int) and moreover with variable length. Should I mask or pad the tuple ? How do I make this work?

0

There are 0 answers