HTML Page Source not like Output

163 views Asked by At

Am Using a Cron Job (Php script) to extract the loto results on the day of the draw from:

http://www.millipiyango.gov.tr/sonuclar/_cs_sayisal.php

the script i am using is by reading the file, then breaking it into lines to find the numbers: as below:

$rfile = "http://www.millipiyango.gov.tr/sonuclar/_cs_sayisal.php";
$lines = file($rfile);  

foreach ($lines as $line_num => $line) {
echo "Line #<b>{$line_num}</b> : " .htmlspecialchars($line) . "<br />\n";
}

The Surprise is that the output of the page (That you see online) is not like the source (When PHP read the file)!!

I tried "Select all" by mouse, and reading "View Selection Source {Firefox}) it worked.

But I need it done by Cron Job.

How can I read the source? it seems the code is hidden using jquery.

2

There are 2 answers

0
Rik_S On BEST ANSWER

The actual data is gathered from http://www.millipiyango.gov.tr/sonuclar/cekilisler/sayisal/20141115.json which seems in the format [year][month][day].json

You can use json_decode in php to get an array with values, which you can then use to do whatever you want.

If you want to see all available data you could do the following:

<?php
$content = file_get_contents("http://www.millipiyango.gov.tr/sonuclar/cekilisler/sayisal/20141115.json");
$json = json_decode($content);
echo "<pre>";    
var_dump($json);
echo "</pre>";    
2
Dan Goodspeed On

What's happening here is that the page is being built by Javascript. When you're saying you're looking at the source, you're actually looking at the DOM tree. If you look at the source (command/control-U), you'll see what I mean. To get to the data, you have two options.

1) Try to reverse engineer the Javascript and see where it's getting the data from that it's using to propagate the site.

or

2) Use something like PhantomJS to build the site DOM for you, and then you can crawl that instead.