Stylegan2-ada tfrecords - ValueError: axes don't match array, images will work one run and not work the next

631 views Asked by At

I'm working on training a GAN through Google Colab with a dataset of photos I scraped from Wikiart and converted to 1024x1024, but keep getting this error when creating the tfrecords:

Traceback (most recent call last):
  File "dataset_tool.py", line 1249, in <module>
    execute_cmdline(sys.argv)
  File "dataset_tool.py", line 1244, in execute_cmdline
    func(**vars(args))
  File "dataset_tool.py", line 714, in create_from_images
    img = img.transpose([2, 0, 1]) # HWC => CHW
ValueError: axes don't match array

I set it to print out what files it would stall on, and began taking those out of the data set; but what it stalls on seems completely random. It'll iterate over a file perfectly fine on one run, then fail on it next run after some other troublesome photo is taken out of the dataset.

I'm not sure if the process of constantly removing photos that stall it would ever end/leave me with a meaningful dataset, is there something I should try to fix it?

2

There are 2 answers

0
richvar On BEST ANSWER

Figured out the solution for this, turns out some of the images I scraped were grayscale. To solve this I used imagemagick (also used to resize the photos to 1024x1024) to check the colorspace. I pointed the terminal to the image folder and ran:

magick identify *.jpg

From here, I ctrl+f'ed to see which ones were marked as "Gray" instead of "sRGB". After taking those out of the dataset it worked like a charm.

0
BillD On

I ran into this not long ago, after spending more time than I care to admit hunting & plucking data out of the source set, I found your question. I even did the ImageMagick search thing to purge greyscale images from the dataset but the issue persisted in my case.

I went so far as to export my dataset to one of our Macs in order to use Preview to mass-edit the color and resolution and export new, uniform jpegs. Still didn't fix it. Here's the solution I came up with. (update dataset_tool.py in your runtime workspace)

def create_from_images(tfrecord_dir, image_dir, shuffle):
    print('Loading images from "%s"' % image_dir)
    image_filenames = sorted(glob.glob(os.path.join(image_dir, '*')))
    if len(image_filenames) == 0:
        error('No input images found')

    img = np.asarray(PIL.Image.open(image_filenames[0]))
    resolution = img.shape[0]
    channels = img.shape[2] if img.ndim == 3 else 1
    if img.shape[1] != resolution:
        error('Input images must have the same width and height')
    if resolution != 2 ** int(np.floor(np.log2(resolution))):
        error('Input image resolution must be a power-of-two')
    if channels not in [1, 3]:
        error('Input images must be stored as RGB or grayscale')

    with TFRecordExporter(tfrecord_dir, len(image_filenames)) as tfr:
        order = tfr.choose_shuffled_order() if shuffle else np.arange(len(image_filenames))
        for idx in range(order.size):
            pil_img = PIL.Image.open(image_filenames[order[idx]])
            pil_img = pil_img.convert("RGB")
            img = np.asarray(pil_img)
            #print('\nimg: "%s" (%d)' % (image_filenames[order[idx]], channels))
            if channels == 1:
                img = img[np.newaxis, :, :] # HW => CHW
            else:
                img = img.transpose([2, 0, 1]) # HWC => CHW
            tfr.add_image(img)

Basically use PIL to convert the image to RGB no matter what.
It does slow the prep process down a little but is convenient if your training data is coming from varied source.