Failed to unpack MNIST data set using python3

777 views Asked by At

I'm trying to convert MNIST data to png format according to the data format said in http://yann.lecun.com/exdb/mnist/

Below is the format of TRAINING SET IMAGE FILE (train-images-idx3-ubyte):

[offset] [type]          [value]          [description] 
0000     32 bit integer  0x00000803(2051) magic number 
0004     32 bit integer  60000            number of images 
0008     32 bit integer  28               number of rows 
0012     32 bit integer  28               number of columns

And this is my code. I use struct to unpack the data set and try to print the first 4 32-bits integers in the data set.

from PIL import Image
import struct

def read_image(filename):
  f = open(filename, 'rb')

  index = 0
  buf = f.read()

  magic, images, rows, columns = struct.unpack_from('>IIII' , buf , index)
  index += struct.calcsize('>IIII')

  print(magic, images, rows, columns)
  f.close()
  # for i in range(images):
  # #for i in xrange(2000):
  #   image = Image.new('L', (columns, rows))

  #   for x in range(rows):
  #     for y in range(columns):
  #       image.putpixel((y, x), int(struct.unpack_from('>B', buf, index)[0]))
  #       index += struct.calcsize('>B')

  #   print('save ' + str(i) + 'image')
  #   image.save('test/' + str(i) + '.png')
if __name__ == '__main__':
  read_image('train-images-idx3-ubyte.gz')

But the output is totally wrong:

529205256 2055376946 226418 1634299437

1

There are 1 answers

0
Vincent_Bryan On

I realized that I forget to extract the "train-images-idx3-ubyte.gz".After extraction, I got a file named "train-images.idx3-ubyte", replaced "train-images-idx3-ubyte.gz" with this new file name, finally it worked.