Asked  8 Months ago    Answers:  5   Viewed   32 times

I'm using file_get_contents() to access a URL.

file_get_contents('http://somenotrealurl.com/notrealpage');

If the URL is not real, it return this error message. How can I get it to error gracefully so that I know that the page doesn't exist and act accordingly without displaying this error message?

file_get_contents('http://somenotrealurl.com/notrealpage') 
[function.file-get-contents]: 
failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found 
in myphppage.php on line 3

for example in zend you can say: if ($request->isSuccessful())

$client = New Zend_Http_Client();
$client->setUri('http://someurl.com/somepage');

$request = $client->request();

if ($request->isSuccessful()) {
 //do stuff with the result
}

 Answers

71

You need to check the HTTP response code:

function get_http_response_code($url) {
    $headers = get_headers($url);
    return substr($headers[0], 9, 3);
}
if(get_http_response_code('http://somenotrealurl.com/notrealpage') != "200"){
    echo "error";
}else{
    file_get_contents('http://somenotrealurl.com/notrealpage');
}
Wednesday, March 31, 2021
 
gMale
answered 8 Months ago
56

Sometimes a website will block crawlers(from remote servers) from getting to their pages.

What they do to work around this is spoof a browsers headers. Like pretend to be Mozilla Firefox instead of the sneaky PHP web scraper they are.

This is a function which uses the cURL library to do just that.

function get_data($url) {

$userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13';

$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
if (!$html) {
    echo "<br />cURL error number:" .curl_errno($ch);
    echo "<br />cURL error:" . curl_error($ch);
    exit;
}
else{
    return $html;
}

//End of cURL function

}

One would then call it as below:

$response = get_data($requesturl);

Curl offers much more options in fetching of remote content and error checking than file_get_contents does. If you even want to customize it further, check out the list of cURL options here - Abridged list of cURL options

Saturday, May 29, 2021
 
akohout
answered 5 Months ago
13

Some variation of the below is what i would use. YMMV depending on what you're doing. If you post your code we can address your specific implementation instead of just providing alternate solutions :-)

$dir = new DirectoryIterator('/path/to/states');
foreach($dir as $file)
{
  if(!$file->isDot() && $file->isFile() && strpos($file->getFilename(), '.txt') !== false)
  {
     $content = file_get_contents($file->getPathname());
     if($content)
     {
        // do your insert code
     }
  }
}
Saturday, May 29, 2021
 
binoculars
answered 5 Months ago
85

That webserver appears to return a 403 Forbidden error when your HTTP request does not include a user-agent string. RCurl by default does not pass a user-agent. You can set one with the useragent= parameter.

myurl<-"http://www.transfermarkt.es/liga-mx-apertura/startseite/wettbewerb/MEXA"
url.exists(myurl, useragent="curl/7.39.0 Rcurl/1.95.4.5")
# [1] TRUE
htmlTreeParse(getURL(myurl, useragent="curl/7.39.0 Rcurl/1.95.4.5"))

The httr package is a bit nicer than RCurl for making HTTP requests in my opinion (and it sets a user-agent string by default). Here's the corresponding code

library(httr)
GET(myurl)
Thursday, August 5, 2021
 
aorfevre
answered 3 Months ago
29

Solved this by using CURL. Here's the code. It will work with remote files e.g. http://yourdomain.com/file.ext

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, ''.$file_path_str.'');
curl_setopt($ch, CURLOPT_HTTPGET, 1);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_USERAGENT, sprintf("Mozilla/%d.0",rand(4,5)));
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$curl_response_res = curl_exec ($ch);
curl_close ($ch);

I could not use @James solution because I'm using ob_start and ob_flush elsewhere in my code, so that would have messed things up for me.

Friday, August 13, 2021
 
konstantin
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share