Asked  7 Months ago    Answers:  5   Viewed   35 times

I have a fairly large csv file (at least for the web) that I don't have control of. It has about 100k rows in it, and will only grow larger.

I'm using the Drupal Module Feeds to create nodes based on this data, and their parser batches the parsing in groups of 50 lines. However, their parser doesn't handle quotation marks properly, and fails to parse about 60% of the csv file. fgetcsv works but doesn't batch things as far as I can tell.

While trying to read the entire file with fgetcsv, PHP eventually runs out of memory. Therefore I would like to be able to break things up into smaller chunks. Is this possible?

 Answers

60

fgetcsv() works by reading one line at a time from a given file pointer. If PHP is running out of memory, perhaps you are trying to parse the whole file at once, putting it all into a giant array. The solution would be to process it line by line without storing it in a big array.

To answer the batching question more directly, read n lines from the file, then use ftell() to find the location in the file where you ended. Make a note of this point, and then you can return to it at some point in the future by calling fseek() before fgetcsv().

Wednesday, March 31, 2021
 
zIs
answered 7 Months ago
zIs
12

As you can read in the documentation for fgetcsv():

A blank line in a CSV file will be returned as an array comprising a single null field, and will not be treated as an error.

Checking for that before adding it to your data array should be sufficient:

while (($result = fgetcsv($in)) !== false) {
    if (array(null) !== $result) { // ignore blank lines
        $csv[] = $result;
    }
}
Wednesday, March 31, 2021
 
phirschybar
answered 7 Months ago
28

OK. I got it working. I've updated the code above, its working for me. Just disabled and re-enabled the module and the same code started working. Its strange but it is, don't know what is the problem.

Wednesday, March 31, 2021
 
Andres
answered 7 Months ago
63

Nope, it's just documentation files that describe hooks by modules. About hook_entity_view: you can add it in custom module: YOURMODULENAME_entity_view(...).

Wednesday, March 31, 2021
 
subroutines
answered 7 Months ago
61

OK, solved.

This is what everyone was suspecting: the encoding of the file was messed up. I could not know which encoding this was, but LibreOffice proposed me Unicode whenever I tried to open the CSVs.

I had to open them with nano to realize there was indeed an encoding problem. Gedit, vim or any other tool I had on my computer raised no errors. When opened with nano, an @ symbol was inserted between every other characters and line feeds were not read correctly.

It seems there are some encodings that are not well supported by fgetcsv. To solve the problem, I recreated the files from nano (copy-paste from another tool that did not display the @).

Saturday, May 29, 2021
 
jcubic
answered 5 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :