Here is my code:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'F:\Installations\tesseract'
print(pytesseract.image_to_string('images/meme1.png', lang='eng'))
And here is the image:
And the output is as follows:
GP.
ed <a
= va
ay Roce Thee .
‘ , Pe ship
RCAC Tm alesy-3
Pein Reg a
years —
? >
ee bs
I see the word years in the output so it does recognize the text but why doesn't it recognize it fully?
OCR is still a very hard problem in cluttered scenes. You probably won't get better results without doing some preprocessing on the image. In this specific case it makes sense to threshold the image first, to only extract the white regions (i.e. the text). You can look into opencv for this: https://docs.opencv.org/3.4/d7/d4d/tutorial_py_thresholding.html
Additionally, in your image, there are only two lines of text in arbitrary positions, so it might make sense to play around with page segmentation modes: https://github.com/tesseract-ocr/tesseract/issues/434