Get correct image orientation by Google Cloud Vision api (TEXT_DETECTION)

16.9k views Asked by At

I tried Google Cloud Vision api (TEXT_DETECTION) on 90 degrees rotated image. It still can return recognized text correctly. (see image below)

That means the engine can recognize text even the image is 90, 180, 270 degrees rotated.

However the response result doesn't include information of correct image orientation. (document: EntityAnnotation)

Is there anyway to not only get recognized text but also get the orientation?
Could Google support it similar to (FaceAnnotation: getRollAngle)

enter image description here

8

There are 8 answers

2
Jordan On BEST ANSWER

As described in the Public Issue Tracker, our engineering team is now aware of this feature request, and there is currently no ETA for its implementation.

Note, orientation information may already be available in your image's metadata. An example of how to extract the metadata can be seen in this Third-party library.

A broad workaround would be to check the returned "boundingPoly" "vertices" for the returned "textAnnotations". By calculating the width and height of each detected word's rectangle, you can figure out if an image is not right-side-up if the rectangle 'height' > 'width' (aka the image is sideways).

1
H. Misfortune On

Usually we need to know the actual rotation angle of the text in the photo. The coordinate information provided in the API is complete enough. You only need to calculate the angle between xy1 and xy0 to get the rotation angle.

// reset
self.transform = CGAffineTransformIdentity;

CGFloat x_0 = viewData.bounds[0].x;
CGFloat y_0 = viewData.bounds[0].y;

CGFloat x_1 = viewData.bounds[1].x;
CGFloat y_1 = viewData.bounds[1].y;

CGFloat x_3 = viewData.bounds[3].x;
CGFloat y_3 = viewData.bounds[3].y;

// distance
CGFloat width = sqrt(pow(x_0 - x_1, 2) + pow(y_0 - y_1, 2));
CGFloat height = sqrt(pow(x_0 - x_3, 2) + pow(y_0 - y_3, 2));
self.size = CGSizeMake(width, height);

// angle
CGFloat angle = atan2((y_1 - y_0), (x_1 - x_0));
// rotation
self.transform = CGAffineTransformRotate(CGAffineTransformIdentity, angle);
0
Jack Fan On

I post my workaround which really works for images 90, 180, 270 degrees rotated. Please see the code below.

GetExifOrientation(annotateImageResponse.getTextAnnotations().get(1));
/**
 *
 * @param ea  The input EntityAnnotation must be NOT from the first EntityAnnotation of
 *            annotateImageResponse.getTextAnnotations(), because it is not affected by
 *            image orientation.
 * @return Exif orientation (1 or 3 or 6 or 8)
 */
public static int GetExifOrientation(EntityAnnotation ea) {
    List<Vertex> vertexList = ea.getBoundingPoly().getVertices();
    // Calculate the center
    float centerX = 0, centerY = 0;
    for (int i = 0; i < 4; i++) {
        centerX += vertexList.get(i).getX();
        centerY += vertexList.get(i).getY();
    }
    centerX /= 4;
    centerY /= 4;

    int x0 = vertexList.get(0).getX();
    int y0 = vertexList.get(0).getY();

    if (x0 < centerX) {
        if (y0 < centerY) {
            //       0 -------- 1
            //       |          |
            //       3 -------- 2
            return EXIF_ORIENTATION_NORMAL; // 1
        } else {
            //       1 -------- 2
            //       |          |
            //       0 -------- 3
            return EXIF_ORIENTATION_270_DEGREE; // 6
        }
    } else {
        if (y0 < centerY) {
            //       3 -------- 0
            //       |          |
            //       2 -------- 1
            return EXIF_ORIENTATION_90_DEGREE; // 8
        } else {
            //       2 -------- 3
            //       |          |
            //       1 -------- 0
            return EXIF_ORIENTATION_180_DEGREE; // 3
        }
    }
}

More info
I found I have to add language hint to make annotateImageResponse.getTextAnnotations().get(1) always follow the rule.

Sample code to add language hint

ImageContext imageContext = new ImageContext();
String [] languages = { "zh-TW" };
imageContext.setLanguageHints(Arrays.asList(languages));
annotateImageRequest.setImageContext(imageContext);
0
MuchHelping On

v1 REST endpoint already has orientationDegrees in their response:

https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse#Page

Unfortunately, google-cloud-vision 3.2.0 hasn't got this one yet https://github.com/googleapis/python-vision/issues/156

0
arun On

There is another OCR product by Google called document AI, which I believe is better suited for OCR on documents. It returns the orientations.

However, when I checked the JSON, it appears that the overall orientation might be incorrect, but the block orientations are correct. Here is the response I got for a page that was rotated 90 deg. clockwise (so the top of the page is to the right):

enter image description here

I would think one can take a majority vote of the block orientations to get the page orientation.

2
faaez On

You can leverage the fact that we know the sequence of characters in a word to infer the orientation of a word as follows (obviously slightly different logic for non-LTR languages):

for page in annotation:
    for block in page.blocks:
        for paragraph in block.paragraphs:
            for word in paragraph.words:
                if len(word.symbols) < MIN_WORD_LENGTH_FOR_ROTATION_INFERENCE:
                    continue
                first_char = word.symbols[0]
                last_char = word.symbols[-1]
                first_char_center = (np.mean([v.x for v in first_char.bounding_box.vertices]),np.mean([v.y for v in first_char.bounding_box.vertices]))
                last_char_center = (np.mean([v.x for v in last_char.bounding_box.vertices]),np.mean([v.y for v in last_char.bounding_box.vertices]))

                #upright or upside down
                if np.abs(first_char_center[1] - last_char_center[1]) < np.abs(top_right.y - bottom_right.y): 
                    if first_char_center[0] <= last_char_center[0]: #upright
                        print 0
                    else: #updside down
                        print 180
                else: #sideways
                    if first_char_center[1] <= last_char_center[1]:
                        print 90
                    else:
                        print 270

Then you can use the orientation of individual words to infer the orientation of the document overall.

0
Alex Hernandez On

Jack Fan answer worked for me. This is my VanillaJS version.

/**
 *
 * @param gOCR  The Google Vision response
 * @return orientation (0, 90, 180 or 270)
 */
function getOrientation(gOCR) {
    var vertexList = gOCR.responses[0].textAnnotations[1].boundingPoly.vertices;

    const ORIENTATION_NORMAL = 0;
    const ORIENTATION_270_DEGREE = 270;
    const ORIENTATION_90_DEGREE = 90;
    const ORIENTATION_180_DEGREE = 180;

    var centerX = 0, centerY = 0;
    for (var i = 0; i < 4; i++) {
        centerX += vertexList[i].x;
        centerY += vertexList[i].y;
    }
    centerX /= 4;
    centerY /= 4;

    var x0 = vertexList[0].x;
    var y0 = vertexList[0].y;

    if (x0 < centerX) {
        if (y0 < centerY) {

            return ORIENTATION_NORMAL;
        } else {
            return ORIENTATION_270_DEGREE;
        }
    } else {
        if (y0 < centerY) {
            return ORIENTATION_90_DEGREE;
        } else {
            return ORIENTATION_180_DEGREE;
        }
    }
}
0
Yuliia Ashomok On

Sometimes it is not possible to get orientation from metadata. For example if user made a photo using camera of mobile device with wrong orientation. My solution is based on Jack Fan answer and for google-api-services-vision (avalible via Maven).

my TextUnit class

  public class TextUnit {
        private String text;

        //    X of lowest left point
        private float llx;

        //    Y of lowest left point
        private float lly;

        //    X of upper right point
        private float urx;

        //    Y of upper right point
        private float ury;
    }

base method:

 List<TextUnit> extractData(BatchAnnotateImagesResponse response) throws AnnotateImageResponseException {
            List<TextUnit> data = new ArrayList<>();

            for (AnnotateImageResponse res : response.getResponses()) {
                if (null != res.getError()) {
                    String errorMessage = res.getError().getMessage();
                    logger.log(Level.WARNING, "AnnotateImageResponse ERROR: " + errorMessage);
                    throw new AnnotateImageResponseException("AnnotateImageResponse ERROR: " + errorMessage);
                } else {
                    List<EntityAnnotation> texts = response.getResponses().get(0).getTextAnnotations();
                    if (texts.size() > 0) {

                        //get orientation
                        EntityAnnotation first_word = texts.get(1);
                        int orientation;
                        try {
                            orientation = getExifOrientation(first_word);
                        } catch (NullPointerException e) {
                            try {
                                orientation = getExifOrientation(texts.get(2));
                            } catch (NullPointerException e1) {
                                orientation = EXIF_ORIENTATION_NORMAL;
                            }
                        }
                        logger.log(Level.INFO, "orientation: " + orientation);

                        // Calculate the center
                        float centerX = 0, centerY = 0;
                        for (Vertex vertex : first_word.getBoundingPoly().getVertices()) {
                            if (vertex.getX() != null) {
                                centerX += vertex.getX();
                            }
                            if (vertex.getY() != null) {
                                centerY += vertex.getY();
                            }
                        }
                        centerX /= 4;
                        centerY /= 4;


                        for (int i = 1; i < texts.size(); i++) {//exclude first text - it contains all text of the page

                            String blockText = texts.get(i).getDescription();
                            BoundingPoly poly = texts.get(i).getBoundingPoly();

                            try {
                                float llx = 0;
                                float lly = 0;
                                float urx = 0;
                                float ury = 0;
                                if (orientation == EXIF_ORIENTATION_NORMAL) {
                                    poly = invertSymmetricallyBy0X(centerY, poly);
                                    llx = getLlx(poly);
                                    lly = getLly(poly);
                                    urx = getUrx(poly);
                                    ury = getUry(poly);
                                } else if (orientation == EXIF_ORIENTATION_90_DEGREE) {
                                    //invert by x
                                    poly = rotate(centerX, centerY, poly, Math.toRadians(-90));
                                    poly = invertSymmetricallyBy0Y(centerX, poly);
                                    llx = getLlx(poly);
                                    lly = getLly(poly);
                                    urx = getUrx(poly);
                                    ury = getUry(poly);
                                } else if (orientation == EXIF_ORIENTATION_180_DEGREE) {
                                    poly = rotate(centerX, centerY, poly, Math.toRadians(-180));
                                    poly = invertSymmetricallyBy0Y(centerX, poly);
                                    llx = getLlx(poly);
                                    lly = getLly(poly);
                                    urx = getUrx(poly);
                                    ury = getUry(poly);
                                }else if (orientation == EXIF_ORIENTATION_270_DEGREE){
                                    //invert by x
                                    poly = rotate(centerX, centerY, poly, Math.toRadians(-270));
                                    poly = invertSymmetricallyBy0Y(centerX, poly);
                                    llx = getLlx(poly);
                                    lly = getLly(poly);
                                    urx = getUrx(poly);
                                    ury = getUry(poly);
                                }


                                data.add(new TextUnit(blockText, llx, lly, urx, ury));
                            } catch (NullPointerException e) {
                                //ignore - some polys has not X or Y coordinate if text located closed to bounds.
                            }
                        }
                    }
                }
            }
            return data;
        }

helper methods:

private float getLlx(BoundingPoly poly) {
        try {
            List<Vertex> vertices = poly.getVertices();

            ArrayList<Float> xs = new ArrayList<>();
            for (Vertex v : vertices) {
                float x = 0;
                if (v.getX() != null) {
                    x = v.getX();
                }
                xs.add(x);
            }

            Collections.sort(xs);
            float llx = (xs.get(0) + xs.get(1)) / 2;
            return llx;
        } catch (Exception e) {
            return 0;
        }
    }

    private float getLly(BoundingPoly poly) {
        try {
            List<Vertex> vertices = poly.getVertices();

            ArrayList<Float> ys = new ArrayList<>();
            for (Vertex v : vertices) {
                float y = 0;
                if (v.getY() != null) {
                    y = v.getY();
                }
                ys.add(y);
            }

            Collections.sort(ys);
            float lly = (ys.get(0) + ys.get(1)) / 2;
            return lly;
        } catch (Exception e) {
            return 0;
        }
    }

    private float getUrx(BoundingPoly poly) {
        try {
            List<Vertex> vertices = poly.getVertices();

            ArrayList<Float> xs = new ArrayList<>();
            for (Vertex v : vertices) {
                float x = 0;
                if (v.getX() != null) {
                    x = v.getX();
                }
                xs.add(x);
            }

            Collections.sort(xs);
            float urx = (xs.get(xs.size()-1) + xs.get(xs.size()-2)) / 2;
            return urx;
        } catch (Exception e) {
            return 0;
        }
    }

    private float getUry(BoundingPoly poly) {
        try {
            List<Vertex> vertices = poly.getVertices();

            ArrayList<Float> ys = new ArrayList<>();
            for (Vertex v : vertices) {
                float y = 0;
                if (v.getY() != null) {
                    y = v.getY();
                }
                ys.add(y);
            }

            Collections.sort(ys);
            float ury = (ys.get(ys.size()-1) +ys.get(ys.size()-2)) / 2;
            return ury;
        } catch (Exception e) {
            return 0;
        }
    }

    /**
     * rotate rectangular clockwise
     *
     * @param poly
     * @param theta the angle of rotation in radians
     * @return
     */
    public BoundingPoly rotate(float centerX, float centerY, BoundingPoly poly, double theta) {

        List<Vertex> vertexList = poly.getVertices();

        //rotate all vertices in poly
        for (Vertex vertex : vertexList) {
            float tempX = vertex.getX() - centerX;
            float tempY = vertex.getY() - centerY;

            // now apply rotation
            float rotatedX = (float) (centerX - tempX * cos(theta) + tempY * sin(theta));
            float rotatedY = (float) (centerX - tempX * sin(theta) - tempY * cos(theta));

            vertex.setX((int) rotatedX);
            vertex.setY((int) rotatedY);
        }
        return poly;
    }

    /**
     * since Google Vision Api returns boundingPoly-s when Coordinates starts from top left corner,
     * but Itext uses coordinate system with bottom left start position -
     * we need invert the result for continue to work with itext.
     *
     * @return text units inverted symmetrically by 0X coordinates.
     */
    private BoundingPoly invertSymmetricallyBy0X(float centerY, BoundingPoly poly) {

        List<Vertex> vertices = poly.getVertices();
        for (Vertex v : vertices) {
            if (v.getY() != null) {
                v.setY((int) (centerY + (centerY - v.getY())));
            }
        }
        return poly;
    }

    /**
     *
     * @param centerX
     * @param poly
     * @return  text units inverted symmetrically by 0Y coordinates.
     */
    private BoundingPoly invertSymmetricallyBy0Y(float centerX, BoundingPoly poly) {
        List<Vertex> vertices = poly.getVertices();
        for (Vertex v : vertices) {
            if (v.getX() != null) {
                v.setX((int) (centerX + (centerX - v.getX())));
            }
        }
        return poly;
    }