I am using CGPDFScanner to scan the pdf. Should I use Td operator to find positions of text? Can I have an example that how to use this operator to get positions of the text? Current I have used Tj and TJ operator to find the text. Now I would like to know position of each word in a single page of pdf. How can I do that?
Thanks
To get the coordinates of the text you need to keep track of the text transformation matrix. See section 5.3.1, "Text Positioning Operators" of the PDF 1.4 Reference. (I'm not sure if later versions of the reference number things the same or not.) While the
Td
operator will set the current translation in the text matrix, there are other operators that affect the text matrix and other text state, as well. You need to keep track of the text matrix as the file is processed. TheTm
operator will directly set the text matrix. TheTD
operator moves to the next line and offsets by the x and y parameters.T*
just moves to the next line.