Spaces are not detected while scanning PDF - iOS (CGPDFScanner)

168 views Asked by At

I am working on pdf scanning,where I want to extract text from the PDF. I am using pdf Multithreading.pdf for searching. I am able to extract the text but am not able extract spaces from the text.I am getting only callbacks for Tj operator and not for TJ. What can be the problem?

Thanks

1

There are 1 answers

3
mkl On BEST ANSWER

I am able to extract the text but am not able extract spaces from the text.I am getting only callbacks for Tj operator and not for TJ.

The reasons are that in your sample document

  1. no spaces are used in the text drawing operations but instead the text drawing position is changed using Tm operations; and
  2. only Tj text drawing operations are used, no TJ ones.

E.g. the text drawing operations of the title page

title on the title page

are:

BT
/F0 50 Tf
1 0 0 1 60 669.225 Tm
(\0006)Tj                                    %  T
1 0 0 1 83.527 669.225 Tm
(\000J\000T)Tj                               %  hr
1 0 0 1 125.631 669.225 Tm
(\000G\000C\000F\000K\000P\000I)Tj           %  eading
1 0 0 1 273.395 669.225 Tm
(\0002)Tj                                    %  P
1 0 0 1 298.272 669.225 Tm
(\000T)Tj                                    %  r
1 0 0 1 313.599 669.225 Tm
(\000Q)Tj                                    %  o
1 0 0 1 340.076 669.225 Tm
(\000I\000T)Tj                               %  gr
1 0 0 1 382.43 669.225 Tm
(\000C\000O\000O\000K\000P\000I)Tj           %  amming
0 Tc
1 0 0 1 60 609.225 Tm
(\000\))Tj                                   %  G
1 0 0 1 91.7 609.225 Tm
(\000W\000K\000F\000G)Tj                     %  uide
ET  

No white space in the Tj text drawing operations, only shifts in the drawing position using Tm.