Result from RSS feed with cURL via PHP is different than cURL via console (first delivers some url list and second delivers the expected XML)

62 views Asked by At

I'm trying to get an old page to run again and it seems that with the change from http to https on that URL = https://feeds.bbci.co.uk/news/politics/rss.xml curl via PHP isn't delivering the XML anymore. After some research I got the curl_setopt_array pimped up like this:

function curl_get($url) {
    
    $client = curl_init();
    curl_setopt_array($client, array(
      CURLOPT_URL => $url,
      CURLOPT_RETURNTRANSFER => true,
      CURLOPT_FOLLOWLOCATION => true,
      CURLOPT_SSL_VERIFYPEER => false,
      CURLOPT_SSL_VERIFYHOST => false,
      CURLOPT_HEADER => true,
      CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
      CURLOPT_HTTPHEADER => ['Accept: application/xml'],
      CURLOPT_ENCODING => "",
      CURLOPT_CUSTOMREQUEST => 'GET',
      CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    ));
    
    $response = curl_exec($client);
    
    curl_close($client);
    return $response;
    }

The response after I do curl_get is a long list of URLs like you see below. Before I'm echoing out the response I'm splitting it into header and xml part after the curl_get($url);:


list($headers, $xml) = explode("\r\n\r\n", $response, 2);
echo "Response Headers: \n" . $headers . "\n";
echo "\nXML Content: \n" . $xml;

Response Headers:
HTTP/1.1 200 OK Server: Belfrage belfrage-cache-status: HIT bsig: 48f765c294080d8b70a116081be96156 Content-Type: text/xml; charset=utf-8 Content-Encoding: gzip brequestid: fe2daa111e8d4f6ea0584f542e09bcf1 x-content-type-options: nosniff bid: cedric req-svc-chain: BELFRAGE Content-Length: 10737 Cache-Control: public, max-age=73 Date: Wed, 28 Feb 2024 05:35:15 GMT Connection: keep-alive Vary: Accept-Encoding Timing-Allow-Origin: https://www.bbc.co.uk, https://www.bbc.com

XML Content:
https://www.bbc.co.uk/news/ 
https://news.bbcimg.co.uk/nol/shared/img/bbc_news_120x60.gif
https://www.bbc.co.uk/news/ RSS for Node Fri, 23 Feb 2024 08:12:42 GMT15
https://www.bbc.co.uk/news/uk-politics-68377243?at_medium=RSS&at_campaign=KARANGA
https://www.bbc.co.uk/news/uk-politics-68377243 Fri, 23 Feb 2024
07:12:03 GMT[...]

As you can see the XML part of the response is no XML, it is a long list of URLs. If I curl the same URL in the console though I get a proper XML as expected (I also see in the source of the page that it actually is XML).

After CBroes input, I extended the config array for the call with this option CURLOPT_HTTPHEADER => ['Accept: application/xml'],. Though it didn't change anything in the output.

Any ideas?

0

There are 0 answers