Is it possible to parse JSON with Goutte?

6.6k views Asked by At

I'm working on crawling web sites and there is no problem for parsing HTML with Goutte so far. But I need to retrieve JSON from a web site and because of the cookie management, I don't want to do this with file_get_contents() - that doesn't work.

I can do with pure cURL but in this case I just want to use Goutte and don't want to use any other library.

So is there any method that I can parse only text via Goutte or do I really have to do this with good old methods?

/* Sample Code */
$client = new Client();
$crawler = $client->request('foo');
$crawler = $crawler->filter('bar'); // of course not working

Thank you.

4

There are 4 answers

3
mithataydogmus On BEST ANSWER

After very deep search inside Goutte libraries I found a way and I wanted to share. Because Goutte is really powerful library but there are so complicated documentation.

Parsing JSON via (Goutte > Guzzle)

Just get needed output page and store json into an array.

$client = new Client(); // Goutte Client
$request = $client->getClient()->createRequest('GET', 'http://***.json');   
/* getClient() for taking Guzzle Client */

$response = $request->send(); // Send created request to server
$data = $response->json(); // Returns PHP Array

Parsing JSON with Cookies via (Goutte + Guzzle) - For authentication

Send request one of the page of the site (main page looks better) to get cookies and then use these cookies for authentication.

$client = new Client(); // Goutte Client
$crawler = $client->request("GET", "http://foo.bar");
/* Send request directly and get whole data. It includes cookies from server and 
it automatically stored in Goutte Client object */

$request = $client->getClient()->createRequest('GET', 'http://foo.bar/baz.json');
/* getClient() for taking Guzzle Client */

$cookies = $client->getRequest()->getCookies();
foreach ($cookies as $key => $value) {
   $request->addCookie($key, $value);
}

/* Get cookies from Goutte Client and add to cookies in Guzzle request */

$response = $request->send(); // Send created request to server
$data = $response->json(); // Returns PHP Array

I hope it helps. Because I almost spend 3 days to understand Gouttle and it's components.

1
gorodezkiy On

I also could get JSON with:

$client->getResponse()->getContent()->getContents()
0
Big Zak On

I figured this out after several hours of search , simply do this :

$client = new Client(); // Goutte Client
$crawler = $client->request("GET", "http://foo.bar");

$jsonData = $crawler->text();
0
Fabian On

mithataydogmus' solution didn't work for me. I created a new class "BetterClient":

use Goutte\Client as GoutteClient;

class BetterClient extends GoutteClient
{
    private $guzzleResponse;

    public function getGuzzleResponse() {
        return $this->guzzleResponse;
    }

    protected function createResponse($response)
    {
        $this->guzzleResponse = $response;
        return parent::createResponse($response);
    }
}

Usage:

$client = new BetterClient();
$request = $client->request('GET', $url);
$data = $client->getGuzzleResponse()->json();