Asked  8 Months ago    Answers:  5   Viewed   32 times

I'm using file_get_contents() to access a URL.


If the URL is not real, it return this error message. How can I get it to error gracefully so that I know that the page doesn't exist and act accordingly without displaying this error message?

failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found 
in myphppage.php on line 3

for example in zend you can say: if ($request->isSuccessful())

$client = New Zend_Http_Client();

$request = $client->request();

if ($request->isSuccessful()) {
 //do stuff with the result



You need to check the HTTP response code:

function get_http_response_code($url) {
    $headers = get_headers($url);
    return substr($headers[0], 9, 3);
if(get_http_response_code('') != "200"){
    echo "error";
Wednesday, March 31, 2021
answered 8 Months ago

Sometimes a website will block crawlers(from remote servers) from getting to their pages.

What they do to work around this is spoof a browsers headers. Like pretend to be Mozilla Firefox instead of the sneaky PHP web scraper they are.

This is a function which uses the cURL library to do just that.

function get_data($url) {

$userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20080311 Firefox/';

$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
if (!$html) {
    echo "<br />cURL error number:" .curl_errno($ch);
    echo "<br />cURL error:" . curl_error($ch);
    return $html;

//End of cURL function


One would then call it as below:

$response = get_data($requesturl);

Curl offers much more options in fetching of remote content and error checking than file_get_contents does. If you even want to customize it further, check out the list of cURL options here - Abridged list of cURL options

Saturday, May 29, 2021
answered 5 Months ago

Some variation of the below is what i would use. YMMV depending on what you're doing. If you post your code we can address your specific implementation instead of just providing alternate solutions :-)

$dir = new DirectoryIterator('/path/to/states');
foreach($dir as $file)
  if(!$file->isDot() && $file->isFile() && strpos($file->getFilename(), '.txt') !== false)
     $content = file_get_contents($file->getPathname());
        // do your insert code
Saturday, May 29, 2021
answered 5 Months ago

That webserver appears to return a 403 Forbidden error when your HTTP request does not include a user-agent string. RCurl by default does not pass a user-agent. You can set one with the useragent= parameter.

url.exists(myurl, useragent="curl/7.39.0 Rcurl/")
# [1] TRUE
htmlTreeParse(getURL(myurl, useragent="curl/7.39.0 Rcurl/"))

The httr package is a bit nicer than RCurl for making HTTP requests in my opinion (and it sets a user-agent string by default). Here's the corresponding code

Thursday, August 5, 2021
answered 3 Months ago

Solved this by using CURL. Here's the code. It will work with remote files e.g.

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, ''.$file_path_str.'');
curl_setopt($ch, CURLOPT_HTTPGET, 1);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_USERAGENT, sprintf("Mozilla/%d.0",rand(4,5)));
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$curl_response_res = curl_exec ($ch);
curl_close ($ch);

I could not use @James solution because I'm using ob_start and ob_flush elsewhere in my code, so that would have messed things up for me.

Friday, August 13, 2021
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :