I'm trying to get host from url using parse_url. But in some queries i get empty results. Here is my function:
function clean_url($urls){
$good_url=array();
for ($i=0;$i<count($urls);$i++){
$url=parse_url($urls[$i]);
//$temp_string=str_replace("http://", "", $urls[$i]);
//$temp_string=str_replace("https://", "", $urls[$i]);
//$temp_string=substr($temp_string, 0,stripos($temp_string,"/"));
array_push($good_url, $url['host']);
}
return $good_url;
}
Input array:
Array (
[0] => https://en.wikipedia.org/wiki/Data
[1] => data.gov.ua/
[2] => e-data.gov.ua/
[3] => e-data.gov.ua/transaction
[4] => https://api.jquery.com/data/
[5] => https://api.jquery.com/jquery.data/
[6] => searchdatamanagement.techtarget.com/definition/data
[7] => www.businessdictionary.com/definition/data.html
[8] => https://data.world/
[9] => https://en.oxforddictionaries.com/definition/data
)
Results array with empty results
Array (
[0] => en.wikipedia.org
[1] =>
[2] =>
[3] =>
[4] => api.jquery.com
[5] => api.jquery.com
[6] =>
[7] =>
[8] => data<
[9] => en.oxforddictionaries.com
)
Some of those
$urls
that are being parsed do not have schemes which is causingparse_url
to recognise the hosts as paths.For example, parsing the url
data.gov.ua/
returnsdata.gov.ua/
as the path. Adding a scheme e.g.https
to that url so it'shttps://data.gov.ua/
will allowparse_url
to recognisedata.gov.ua/
as the host.