Incorrect predictions on extracted images from text

154 views Asked by At

I trained a model in PyTorch on the EMNIST data set - and got about 85% accuracy on the test set. Now, I have an image of handwritten text from which I have extracted individual letters, but I'm getting very poor accuracy on the images that I have extracted.

One hot mappings that I'm using -

letters_EMNIST = {0: '0', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5', 6: '6', 7: '7', 8: '8', 9: '9',
         10: 'A', 11: 'B', 12: 'C', 13: 'D', 14: 'E', 15: 'F', 16: 'G', 17: 'H', 18: 'I', 19: 'J',
         20: 'K', 21: 'L', 22: 'M', 23: 'N', 24: 'O', 25: 'P', 26: 'Q', 27: 'R', 28: 'S', 29: 'T',
         30: 'U', 31: 'V', 32: 'W', 33: 'X', 34: 'Y', 35: 'Z', 36: 'a', 37: 'b', 38: 'd', 39: 'e',
         40: 'f', 41: 'g', 42: 'h', 43: 'n', 44: 'q', 45: 'r', 46: 't'}

For reference, this is an example of the image used for testing data -

enter image description here

And this is an example of the image I extracted -

enter image description here

How can I debug this?

1

There are 1 answers

5
Szymon Maszke On BEST ANSWER

First of all, you should check your extraction technique and whether it works correctly. Rest of the answer assumes this step is done.

Distribution of EMNIST and distribution of your extracted data is probably quite different, hence it might be hard to obtain good results.

There are some steps you may do to improve the score though.

Additional data

If you have some way to extract more images of letters and ciphers and label them appropriately you should use it during neural net training.

The more of those you get, the better your results probably be (provided data is of quite high quality, e.g. not many false positives).

Data augmentation

You could do this without too much work I think.

You have to remember though that data augmentation has to preserve labels. So no things like flipping (it would be fine for number 8 but u flipped could become n).

Augmentations which should be fine:

  • Small rotations (up to 20 degrees or so)
  • Small Gaussian noise or similar
  • CutOut with small patches (black rectangulars of size 3x3 pixels or similar being zeroed out on the image)
  • Gentle spatial transformations (rescale, shift, linear transformations)
  • MixUp (you mix linearly two images with different labels (e.g. image of A multiplied by 0.6 and cipher 2 multiplied by 0.4 and try to classify it as 0.6 A and 0.4 2). Remember labels dont have to be solely binary. This should help your network not to be overconfident with it's predictions

You can find all of those in albumentations third party library.

Model augmentation

For your model you could employ thing like dropout (be careful with it's integration with batch norm though), shake shake, Stochastic Depth etc.

Final

You can use all of those, remember to test how it performs. I tried to list them with the most promising approach on top.

One possibility would be to make the model more robust to variance via augmentation.