I want to extract text from a pdf file using smalot/pdfparser,but i've got empty result on some file. the pdf file without password and open normally using chrome. I've tried another pdf file working fine.
this is my code
$parser = new \Smalot\PdfParser\Parser(); // Parse pdf file using Parser library
$pdf = $parser->parseFile($file);
$metaData = $pdf->getDetails();
print_r($metaData);
$pages = $pdf->getPages();
foreach ($pages as $page) {
$text = $page->getText();
echo "<div>".$text."</div>";
}
echo $file;
the result just
Array
Array
(
[Producer] => cairo 1.17.4 (https://cairographics.org
[Pages] => 1
)
<div></div>D:\web\D\public\pdf_po/123.pdf
can anyone explain my problem? this is my pdf file : www.mediafire.com/file/azb7yddqo2ry55j/123.pdf/file
PDFtoText
Should give you the best results since there are no table divisions in a PDF text:
So when that text layout is reversed into Word.txt or as suitable for any other Text Processor you can simply draw the tables or divisions around the text. The alternative is import into Excel or any other "Spreadsheet" program.
Then it's easier to cut and paste real tabular data, or use it any other way. The primary trick is ensuring you extract exactly the gridlike way it is stored in a PDF and the closest to that is reprinting the text file via PDFtoText (there are many versions so find one that suits your needs).