I am taking screenshots on Android with two different methods:
- By running
/system/bin/screencap -p $path. - With the
MediaProjectionAPI.
Even though it's the exact same screen, when performing OCR (with the use of Tesseract) I get different results.
With /system/bin/screencap I get the expected results.
With the MediaProjection API is unable to recognise any or all text correctly, hence I need to preprocess the image with a binarization algorithm.
Why is that? I have checked screencap source code and it seems that is uses PNG compression, config ARGB_8888 and quality 100%. As you can see here: https://android.googlesource.com/platform/frameworks/base/+/master/cmds/screencap/screencap.cpp
This is how I am creating the bitmap by using the MediaProjection API:
public class ImageTransmogrifier implements ImageReader.OnImageAvailableListener {
private final int width;
private final int height;
private final ImageReader imageReader;
private final ScreenshotService svc;
private Bitmap latestBitmap=null;
ImageTransmogrifier(ScreenshotService svc) {
this.svc=svc;
Display display=svc.getWindowManager().getDefaultDisplay();
Point size=new Point();
display.getRealSize(size);
int width=size.x;
int height=size.y;
while (width*height > (2<<19)) {
width=width>>1;
height=height>>1;
}
this.width=width;
this.height=height;
imageReader=ImageReader.newInstance(width, height,
PixelFormat.RGBA_8888, 2);
imageReader.setOnImageAvailableListener(this, svc.getHandler());
}
@Override
public void onImageAvailable(ImageReader reader) {
final Image image=imageReader.acquireLatestImage();
if (image!=null) {
Image.Plane[] planes=image.getPlanes();
ByteBuffer buffer=planes[0].getBuffer();
int pixelStride=planes[0].getPixelStride();
int rowStride=planes[0].getRowStride();
int rowPadding=rowStride - pixelStride * width;
int bitmapWidth=width + rowPadding / pixelStride;
if (latestBitmap == null ||
latestBitmap.getWidth() != bitmapWidth ||
latestBitmap.getHeight() != height) {
if (latestBitmap != null) {
latestBitmap.recycle();
}
latestBitmap=Bitmap.createBitmap(bitmapWidth,
height, Bitmap.Config.ARGB_8888);
}
latestBitmap.copyPixelsFromBuffer(buffer);
image.close();
ByteArrayOutputStream baos=new ByteArrayOutputStream();
Bitmap cropped=Bitmap.createBitmap(latestBitmap, 0, 0,
width, height);
cropped.compress(Bitmap.CompressFormat.PNG, 100, baos);
byte[] newPng=baos.toByteArray();
svc.processImage(newPng);
}
}
Surface getSurface() {
return(imageReader.getSurface());
}
int getWidth() {
return(width);
}
int getHeight() {
return(height);
}
void close() {
imageReader.close();
}
}
I've been told that basically I am using more of the processor to do the recording leaving less to the OCR. The OCR has less cycles, so the accuracy decreases within a given amount of time. And that's also the reason I don't need to pre-process the image with screencap. Because chances are a one-time upscale isn't nearly as much as constant streaming through.
Is there any foundation in this? If so, should I use something else instead of the MediaProjection or simply pre-process the images?