Extract Text From PDF File using Smalot/pdfparser return empty result

644 views Asked by At

I want to extract text from a pdf file using smalot/pdfparser,but i've got empty result on some file. the pdf file without password and open normally using chrome. I've tried another pdf file working fine.

this is my code

$parser = new \Smalot\PdfParser\Parser(); // Parse pdf file using Parser library 
$pdf = $parser->parseFile($file);
$metaData = $pdf->getDetails();
print_r($metaData); 
$pages  = $pdf->getPages();
foreach ($pages as $page) {
            $text = $page->getText();
            echo "<div>".$text."</div>";
}
echo $file;

the result just

Array
Array
(
    [Producer] => cairo 1.17.4 (https://cairographics.org
    [Pages] => 1
)
<div></div>D:\web\D\public\pdf_po/123.pdf

can anyone explain my problem? this is my pdf file : www.mediafire.com/file/azb7yddqo2ry55j/123.pdf/file

1

There are 1 answers

0
K J On

PDFtoText

Should give you the best results since there are no table divisions in a PDF text:

enter image description here

So when that text layout is reversed into Word.txt or as suitable for any other Text Processor you can simply draw the tables or divisions around the text. The alternative is import into Excel or any other "Spreadsheet" program.

Then it's easier to cut and paste real tabular data, or use it any other way. The primary trick is ensuring you extract exactly the gridlike way it is stored in a PDF and the closest to that is reprinting the text file via PDFtoText (there are many versions so find one that suits your needs).