How to get attribute, value and text from tags in array using phpquery

368 views Asked by At

I'm trying to get value, text and hyperlinks using PHPQuery from a large file and convert it to Array. I have tried some of the code but confusion in foreach loop to get data from all class="hl" into an array.

<?php 
$str ='
<main>
<div class="artfeed ">
<div class="split split_0">
 <div class="split_in">

  <div class="hl" data-id="1036294107">
    <span class="f" country="US"><!-- --></span>
    <div class="hl__inner"><a class="hll" href="http://example.com/001/" target="_blank" rel="nofollow">Some of text here</a>
     <span class="end"></span> 
     <span class="meta">
      <span class="src" data-pub="DATAPUB">
      <span class="src-part">
      exampleOne.com
      <svg class="svg-inline--fa fa-cog fa-w-16" aria-hidden="true" focusable="false" data-prefix="fas" data-icon="cog" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" data-fa-i2svg="">
      </span>
      </span>
      <span class="time" data-time="1592802284">12:04</span>
      </span>
     <a class="hl__menu-toggle c-context-menu__btn js-article-menu__toggle" href="#"></a>
    </div>
  </div>

<div class="hl" data-id="1036294107">
    <span class="f" country="US"><!-- --></span>
    <div class="hl__inner"><a class="hll" href="http://example.com/001/" target="_blank" rel="nofollow">Some of text here</a>
     <span class="end"></span> 
     <span class="meta">
      <span class="src" data-pub="DATAPUB">
      <span class="src-part">
      exampleOne.com
      <svg class="svg-inline--fa fa-cog fa-w-16" aria-hidden="true" focusable="false" data-prefix="fas" data-icon="cog" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" data-fa-i2svg="">
      </span>
      </span>
      <span class="time" data-time="1592802284">12:04</span>
      </span>
     <a class="hl__menu-toggle c-context-menu__btn js-article-menu__toggle" href="#"></a>
    </div>
  </div>

<div class="hl" data-id="1036294107">
    <span class="f" country="US"><!-- --></span>
    <div class="hl__inner"><a class="hll" href="http://example.com/001/" target="_blank" rel="nofollow">Some of text here</a>
     <span class="end"></span> 
     <span class="meta">
      <span class="src" data-pub="DATAPUB">
      <span class="src-part">
      exampleOne.com
      <svg class="svg-inline--fa fa-cog fa-w-16" aria-hidden="true" focusable="false" data-prefix="fas" data-icon="cog" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" data-fa-i2svg="">
      </span>
      </span>
      <span class="time" data-time="1592802284">12:04</span>
      </span>
     <a class="hl__menu-toggle c-context-menu__btn js-article-menu__toggle" href="#"></a>
    </div>
  </div>

 </div>
</div>
</div>
</main>
';
?>

Need result like this:

/*
Array()
Need result: 
Country  : US
href     : http://example.com/001/
Text     : Some of text here
src-part : exampleOne.com
time     : 12:04

Country  : US
href     : http://example.com/001/
Text     : Some of text here
src-part : exampleOne.com
time     : 12:04

Country  : US
href     : http://example.com/001/
Text     : Some of text here
src-part : exampleOne.com
time     : 12:04
*/

I have some of code

<?php
require("phpQuery.php");
$doc = phpQuery::newDocument($str);
$doc =  $doc['body']->find('main')->find('.artfeed')->find('.hl');
$links = array();
foreach($doc['div'] as $item)
{
 $node = pq($item);
  $sibling = $node->next();
  if ( $sibling->is('a:first') ) {
      $links[] = array(
      $node->attr('country'),
      $sibling->attr('href'),
      $sibling->text(),
    ); 
  } 
}

// Display result:
print_r($links);
?>
1

There are 1 answers

8
Spudly On

If you print_r($doc) after the following line, are you seeing what you'd expect to see as far as the document structure?

$doc =  $doc['body']->find('main')->find('.artfeed')->find('.hl');

I've used Simple HTML Dom before but not phpQuery so I'm not sure if there is an error in the above line or somewhere else.

Based on examples I saw though, you should be able to use CSS syntax to find elements. Change your doc to something like this:

$doc =  $doc['body']->find('main')->find('.artfeed');

Then just use pq() and find() with CSS syntax to find your elements directly without the loop.

$content = pq($doc);
$links[] = array(
    $content->find('div.hl > span.f')->attr('country'),
    $content->find('div.hl > div.hl__inner > a.hll')->attr('href'),
    $content->find('div.hl > div.hl__inner > a.hll')->text(),
);

EDIT: For multiple hl divs, I think something like this might work:

$doc =  $doc['body']->find('main')->find('.artfeed');
foreach (pq($doc)->find('.hl') as $hl) {
    $links[] = array(
        $hl->find('span.f')->attr('country'),
        $hl->find('div.hl__inner > a.hll')->attr('href'),
        $hl->find('div.hl__inner > a.hll')->text(),
    }
);