Asked  7 Months ago    Answers:  5   Viewed   32 times

I have no experience when dealing with large files so I am not sure what to do about this. I have attempted to read several large files using file_get_contents ; the task is to clean and munge them using preg_replace().

My code runs fine on small files ; however, the large files (40 MB) trigger an Memory exhausted error:

PHP Fatal error:  Allowed memory size of 16777216 bytes exhausted (tried to allocate 41390283 bytes)

I was thinking of using fread() instead but I am not sure that'll work either. Is there a workaround for this problem?

Thanks for your input.

This is my code:

<?php
error_reporting(E_ALL);

##get find() results and remove DOS carriage returns.
##The error is thrown on the next line for large files!
$myData = file_get_contents("tmp11");
$newData = str_replace("^M", "", $myData);

##cleanup Model-Manufacturer field.
$pattern = '/(Model-Manufacturer:)(n)(w+)/i';
$replacement = '$1$3';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup Test_Version field and create comma delimited layout.
$pattern = '/(Test_Version=)(d).(d).(d)(n+)/';
$replacement = '$1$2.$3.$4      ';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup occasional empty Model-Manufacturer field.
$pattern = '/(Test_Version=)(d).(d).(d)      (Test_Version=)/';
$replacement = '$1$2.$3.$4      Model-Manufacturer:N/A--$5';
$newData = preg_replace($pattern, $replacement, $newData);

##fix occasional Model-Manufacturer being incorrectly wrapped.
$newData = str_replace("--","n",$newData);

##fix 'Binary file' message when find() utility cannot id file.
$pattern = '/(Binary file).*/';
$replacement = '';
$newData = preg_replace($pattern, $replacement, $newData);
$newData = removeEmptyLines($newData);

##replace colon with equal sign
$newData = str_replace("Model-Manufacturer:","Model-Manufacturer=",$newData);

##file stuff
$fh2 = fopen("tmp2","w");
fwrite($fh2, $newData);
fclose($fh2);

### Functions.

##Data cleanup
function removeEmptyLines($string)
{
        return preg_replace("/(^[rn]*|[rn]+)[st]*[rn]+/", "n", $string);
}
?>

 Answers

15

Firstly you should understand that when using file_get_contents you're fetching the entire string of data into a variable, that variable is stored in the hosts memory.

If that string is greater than the size dedicated to the PHP process then PHP will halt and display the error message above.

The way around this to open the file as a pointer, and then take a chunk at a time. This way if you had a 500MB file you can read the first 1MB of data, do what you will with it, delete that 1MB from the system's memory and replace it with the next MB. This allows you to manage how much data you're putting in the memory.

An example if this can be seen below, I will create a function that acts like node.js

function file_get_contents_chunked($file,$chunk_size,$callback)
{
    try
    {
        $handle = fopen($file, "r");
        $i = 0;
        while (!feof($handle))
        {
            call_user_func_array($callback,array(fread($handle,$chunk_size),&$handle,$i));
            $i++;
        }

        fclose($handle);

    }
    catch(Exception $e)
    {
         trigger_error("file_get_contents_chunked::" . $e->getMessage(),E_USER_NOTICE);
         return false;
    }

    return true;
}

and then use like so:

$success = file_get_contents_chunked("my/large/file",4096,function($chunk,&$handle,$iteration){
    /*
        * Do what you will with the {$chunk} here
        * {$handle} is passed in case you want to seek
        ** to different parts of the file
        * {$iteration} is the section of the file that has been read so
        * ($i * 4096) is your current offset within the file.
    */

});

if(!$success)
{
    //It Failed
}

One of the problems you will find is that you're trying to perform regex several times on an extremely large chunk of data. Not only that but your regex is built for matching the entire file.

With the above method your regex could become useless as you may only be matching a half set of data. What you should do is revert to the native string functions such as

  • strpos
  • substr
  • trim
  • explode

for matching the strings, I have added support in the callback so that the handle and current iteration are passed. This will allow you to work with the file directly within your callback, allowing you to use functions like fseek, ftruncate and fwrite for instance.

The way you're building your string manipulation is not efficient whatsoever, and using the proposed method above is by far a much better way.

Hope this helps.

Wednesday, March 31, 2021
 
pinaki
answered 7 Months ago
86

You should point to your vendor/autoload.php at Settings | PHP | PHPUnit when using PHPUnit via Composer.

This blog post has all the details (with pictures) to successfully configure IDE for such scenario: http://confluence.jetbrains.com/display/PhpStorm/PHPUnit+Installation+via+Composer+in+PhpStorm

Related usability ticket: http://youtrack.jetbrains.com/issue/WI-18388

P.S. The WI-18388 ticket is already fixed in v8.0

Wednesday, March 31, 2021
 
ojrac
answered 7 Months ago
80

I found that attempting to var_dump() or print_r() JCategoryNode results in an endless loop. Therefore, I modified my model above to the following:

<?php
// No direct access to this file
defined('_JEXEC') or die;

// import Joomla Categories library
jimport( 'joomla.application.categories' );

class CtItemModelCtItem extends JModel
{

    private $_items = null;

    private $_parent = null;

    public function getItems($recursive = false)
    {
        $categories = JCategories::getInstance('Content');
        $this->_parent = $categories->get(15);
        if(is_object($this->_parent))
        {
            $this->_items = $this->_parent->getChildren($recursive);
        }
        else
        {
            $this->_items = false;
        }

        return $this->loadCats($this->_items);
    }


    protected function loadCats($cats = array())
    {

        if(is_array($cats))
        {
            $i = 0;
            $return = array();
            foreach($cats as $JCatNode)
            {
                $return[$i]->title = $JCatNode->title;
                if($JCatNode->hasChildren())
                    $return[$i]->children = $this->loadCats($JCatNode->getChildren());
                else
                    $return[$i]->children = false;

                $i++;
            }

            return $return;
        }

        return false;

    }

}
Wednesday, March 31, 2021
 
mgraph
answered 7 Months ago
59

Your PHP configuration is limiting PHP to only 16 megabytes of memory. You need to modify the memory_limit configuration directive in php.ini to increase it.

Look for the line in php.ini that looks like this:

memory_limit = 16M

...and change to to a large value (16M = 16 megabytes, you could increase it to something like 64M for 64 megabytes, et cetera). If you can't find any line like that, add it.

If you prefer to only increase it on a per-script basis, you can also use ini_set() to change the value for that script only.

Saturday, May 29, 2021
 
inieto
answered 5 Months ago
79

On Mac OS X environment variables available in Terminal and for the normal applications can be different, check the related question for the solution how to make them similar.

Note that this solution will not work on Mountain Lion (10.8).

Saturday, May 29, 2021
 
Nate
answered 5 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :