Problem with using PHPQuery

6.9k views Asked by At

I trying to use PHPquery to scrape web-page (free-lance.ru)

Equiv code in Simple HTML Dom is working:

include('simple_html_dom.php');

$shd = str_get_html($html);

$projects = array();
$i = 0;
foreach ($shd->find('.project-preview') as $work){
    $projects[$i]['name'] = $work->find('h3', 0)->children(1)->plaintext;
    $i++;
}

But i need it in PHPQuery.

I tried to use something like:

include('phpQuery.php');

$pq = phpQuery::newDocument($html);

foreach ($pq->find('.project-preview') as $work){
        echo 'wow';
}

But it doesn't working... sizeof($pq->find('.project-preview')) is 0

I will be very thankful for any help.

2

There are 2 answers

2
gnud On

Your code looks fine. This basically equivalient code ran just fine for me.

$q = phpQuery::newDocument('                                                    
<html>                                                                          
<body>                                                                          
<div class="findme">Lorem ipsum</div><div class="ignoreme">dolor sit amet</div> 
</body>                                                                         
</html>                                                                         
'                                                                               
);                                                                              

foreach($q->find('.findme')  as $tag) {                                         
    echo 'Found: '.$tag->tagName."(".$tag->getAttribute('class').")\n";         
}

Result:

Found: div(findme)

So, the question becomes:

  • Are you getting any errors? (and is error_reporting turned on? What about display_errors?)
  • What does your HTML look like?

Update:

From your comment below, it turns out you're trying to open a html file with newDocment(). That just won't work. You have to use newDocumentFile() - or read the file contents yourself, and then use newDocument(), passing what you read to phpQuery.

2
KoalaBear On

I had the same question! So answering for the next visitors to this question.

Simple HTML Dom has some memory leak problems. You have to be very carefull when you are 'cloning' object by its selector. Avoid it!

With phpQuery it is only one command which clears all, as far as I know.

phpQuery::unloadDocuments();

I tested phpQuery. Which looks like it has NO memory leaks. Also very very low memory usage. Only 4 kB on a file of 90 kB. So it looks like it parses real-time and does not have the document in memory. At least that is what I found, I could be wrong.

Also tried creating 20-30 docs and use unload every time, no memory increase... nice!

Here's me answer:

include('phpQuery.php');

$pq = phpQuery::newDocument($html);

$projects = array();
$i = 0;

foreach ($pq['.project-preview'] as $work) {
    // iteration returns PLAIN dom nodes, NOT phpQuery objects
    $pqwork = pq($work);

    $projects[$i]['name'] = $pqwork['div']->eq(1)->text();
    // Unfortunately pq($work)['div']->eq(1)->text(); does not work

    $i++;
}

phpQuery::unloadDocuments();

Would be nice if there we're some more examples of the basic things! Good project, bad documentation. Or at least I couldn't find the documentation which explain text() function for example.

Benchmarks estimates:

  • phpQuery is ~ 3,5 faster in loading documents.

  • Simple HTML Dom looks ~ 30% faster in selecting :(