Best setting for scanners for scanning documents(TIFF and PDF)

10.2k views Asked by At

What are the best settings for scanner in order to scan documents(white & black text) and use them for OCR conversion(for best results) and what are standard settings and specification for PDF and TIFF format ?

4

There are 4 answers

0
nguyenq On

For OCR purpose, I would scan a document at 300DPI, B/W or grayscale, and uncompressed TIFF or PNG format.

4
Ilya Evdokimov On

For OCR, best scanning settings are:

  • 300 dpi resolution for regular text, 400 dpi resolution for particularly small fonts (fine print)
  • Black & white for text, greyscale for small fonts, color for pictures
  • TIFF format. Group4 is used for black & white (very small file size). If color is needed, use Uncompressed (very large file size).

Some OCR technologies may have special preferences, which may slightly help, but they are usually minor.

0
steve hannah On

While 300DPI is optimal for "perfect" inputs, if you are working with imperfect inputs (e.g. from a typewriter or dot-matrix printer), then the high resolution will actually throw tesseract off. In cases like this, it is better to use a lower resolution to sort of hide the imperfections. E.g. with a dot-matrix printer I get significantly better results at 150dpi than 300dpi.

0
David On

If you want a general answer, 300 DPI is good. The best OCR results usually for B/W images and if your image quality is low, you might improve it by applying image processing.

Also, if you are saving the scanned image then feeding it to the OCR engine, do NOT use lossy compression like JPEG. Note that there is a lossless JPEG compression but it is not commonly supported.