I'm trying to get an old page to run again and it seems that with the change from http to https on that URL = https://feeds.bbci.co.uk/news/politics/rss.xml curl via PHP isn't delivering the XML anymore. After some research I got the curl_setopt_array pimped up like this:
function curl_get($url) {
$client = curl_init();
curl_setopt_array($client, array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_SSL_VERIFYHOST => false,
CURLOPT_HEADER => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_HTTPHEADER => ['Accept: application/xml'],
CURLOPT_ENCODING => "",
CURLOPT_CUSTOMREQUEST => 'GET',
CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
));
$response = curl_exec($client);
curl_close($client);
return $response;
}
The response after I do curl_get is a long list of URLs like you see below. Before I'm echoing out the response I'm splitting it into header and xml part after the curl_get($url);:
list($headers, $xml) = explode("\r\n\r\n", $response, 2);
echo "Response Headers: \n" . $headers . "\n";
echo "\nXML Content: \n" . $xml;
Response Headers:
HTTP/1.1 200 OK Server: Belfrage belfrage-cache-status: HIT bsig: 48f765c294080d8b70a116081be96156 Content-Type: text/xml; charset=utf-8 Content-Encoding: gzip brequestid: fe2daa111e8d4f6ea0584f542e09bcf1 x-content-type-options: nosniff bid: cedric req-svc-chain: BELFRAGE Content-Length: 10737 Cache-Control: public, max-age=73 Date: Wed, 28 Feb 2024 05:35:15 GMT Connection: keep-alive Vary: Accept-Encoding Timing-Allow-Origin: https://www.bbc.co.uk, https://www.bbc.com
XML Content:
https://www.bbc.co.uk/news/
https://news.bbcimg.co.uk/nol/shared/img/bbc_news_120x60.gif
https://www.bbc.co.uk/news/ RSS for Node Fri, 23 Feb 2024 08:12:42 GMT15
https://www.bbc.co.uk/news/uk-politics-68377243?at_medium=RSS&at_campaign=KARANGA
https://www.bbc.co.uk/news/uk-politics-68377243 Fri, 23 Feb 2024
07:12:03 GMT[...]
As you can see the XML part of the response is no XML, it is a long list of URLs. If I curl the same URL in the console though I get a proper XML as expected (I also see in the source of the page that it actually is XML).
After CBroes input, I extended the config array for the call with this option CURLOPT_HTTPHEADER => ['Accept: application/xml'],. Though it didn't change anything in the output.
Any ideas?