I am using laravel and using https://csv.thephpleague.com/ to parse csv.
My function is something like
$path = $request->file('import_file')->getRealPath();
$csv = Reader::createFromPath($path, 'r');
$csv->setHeaderOffset(0);
$csv_header = $csv->getHeader();
$sample_data = $csv->fetchOne();
$sample_data = array_values($sample_data);
$records = $reader->getRecords();
$csv_file_id = Csv_data::create([
'csv_filename' => $request->file('import_file')->getClientOriginalName(),
'csv_header' => json_encode($csv_header),
'csv_data' => json_encode($records)
]);
How can i parse large data sets, by dealing against execution time limit.
Well, am pretty new to these things, so I request just not commenting like use this and that. Up to now time is just passing by trying this and that package. So, solution with code snippets could be better.
Also I tried with,
$stmt = (new Statement())
->offset($offset)
->limit($limit)
;
But with no success. !st reason even limiting offset and running in loop by increasing offset, it shows same error of execution time. 2nd reason, its little difficult for me to end the loop with good logic.
Looking for some help. I will be available for instant reply.
Are you using a console command for this?
https://laravel.com/docs/5.6/artisan
If you run into memory limits when doing through console, you can first try to increase php memory limit.
If that is still not enough, last option is to cut up .csv into parts.
But this should not be necessary unless you are dealing with a very vert large .csv (unlikely).
By default, PHP is usually set to execute a HTTP request for 30 seconds before timing out -- to prevent a runaway script or infinite loop from processing forever. It sounds like this is what you're running into.
The quick and dirty method is to add ini_set('max_execution_time', 300); at the top of your script. This will tell php to run for 300 seconds (5 minutes) before timing out.
You can adjust that time as needed, but if it regularly takes longer than that, you may want to look at other options -- such as creating a console command (https://laravel.com/docs/5.6/artisan) or running it on a schedule.
Related
I have this file of 10 millions words, one word on every line. I'm trying to open that file, read every line, put it in an array and count the number of occurrences for each word.
wartek
mei_atnz
sommerray
swaggyfeed
yo_bada
ronnieradke
… and so on (10M+ lines)
I can open the file, read its size, even parse it line by line and echo the line on the browser (it's very long, of course), but when I'm trying to perform any other operation, the script just refuse to execute. No error, no warning, no die(…), nothing.
Accessing the file is always OK, but it's the operations which are not performed with the same success. I tried this and it worked…
while(!feof($pointer)) {
$row = fgets($pointer);
print_r($row);
}
… but this didn't :
while(!feof($pointer)) {
$row = fgets($pointer);
array_push($dest, $row);
}
Also tried with SplFileObject or file($source, FILE_IGNORE_NEW_LINES) with the same result every time (not okay with big file, okay with small file)
Guessing that the issue is not the size (150 ko), but probably the length (10M+ lines), I chunked the file to reduce it to ~20k without any improvement, then reduced it again to ~8k lines, and it worked.
I also removed the time limit with set_time_limit(0); or removed (almost) any memory limit both in the php.ini and in my script ini_set('memory_limit', '8192M');.Regarding the errors I could have, I set the error_reporting(E_ALL); at the top of my script.
So the questions are :
is there a maximum number of lines that can be read by PHP built-in functions?
why I can echo or print_r but not perform any other operations?
I think you might be running into a long execution time:
How to increase the execution timeout in php?
Different operation take different time. Printing might be a lot easier than pushing 10M new data into an array one-by-one. It's strange that you don't get any error messages, you should receive process exceeded time somewhere.
Using fgetcsv, can I somehow do a destructive read where rows I've read and processed would be discarded so if I don't make it through the whole file in the first pass, I can come back and pick up where I left off before the script timed out?
Additional Details:
I'm getting a daily product feed from a vendor that comes across as a 200mb .gz file. When I unpack the file, it turns into a 1.5gb .csv with nearly 500,000 rows and 20 - 25 fields. I need to read this information into a MySQL db, ideally with PHP so I can schedule a CRON to run the script at my web hosting provider every day.
I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.
My idea was to grab the information from the .csv using the fgetcsv function, but I'm expecting to have to take multiple passes at the file because of the 3 minute timeout, I was thinking it would be nice to whittle away at the file as I process it so I wouldn't need to spend cycles skipping over rows that were already processed in a previous pass.
From your problem description it really sounds like you need to switch hosts. Processing a 2 GB file with a hard time limit is not a very constructive environment. Having said that, deleting read lines from the file is even less constructive, since you would have to rewrite the entire 2 GB to disk minus the part you have already read, which is incredibly expensive.
Assuming you save how many rows you have already processed, you can skip rows like this:
$alreadyProcessed = 42; // for example
$i = 0;
while ($row = fgetcsv($fileHandle)) {
if ($i++ < $alreadyProcessed) {
continue;
}
...
}
However, this means you're reading the entire 2 GB file from the beginning each time you go through it, which in itself already takes a while and you'll be able to process fewer and fewer rows each time you start again.
The best solution here is to remember the current position of the file pointer, for which ftell is the function you're looking for:
$lastPosition = file_get_contents('last_position.txt');
$fh = fopen('my.csv', 'r');
fseek($fh, $lastPosition);
while ($row = fgetcsv($fh)) {
...
file_put_contents('last_position.txt', ftell($fh));
}
This allows you to jump right back to the last position you were at and continue reading. You obviously want to add a lot of error handling here, so you're never in an inconsistent state no matter which point your script is interrupted at.
You can avoid timeout and memory error to some extent when reading like a Stream. By Reading line by line and then inserts each line into a database (Or Process accordingly). In that way only single line is hold in memory on each iteration. Please note don't try to load a huge csv-file into an array, that really would consume a lot of memory.
if(($handle = fopen("yourHugeCSV.csv", 'r')) !== false)
{
// Get the first row (Header)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// Process Your Data
unset($data);
}
fclose($handle);
}
I think a better solution (it will be phenomnally inefficient to continuously rewind and write to open file stream) would be to track the file position of each record read (using ftell) and store it with the data you've read - then if you have to resume, then just fseek to the last position.
You could try loading the file directly using mysql's read file function (which will likely be a lot faster) although I've had problems with this in the past and ended up writing my own php code.
I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.
What have you tried?
The memory can be limited by other means than the php.ini file, but I can't imagine how anyone could actually prevent you from using a different execution time (even if ini_set is disabled, from the command line you could run php -d max_execution_time=3000 /your/script.php or php -c /path/to/custom/inifile /your/script.php )
Unless you are trying to fit the entire datafile into memory then there should be no issue with a memory limit of 128Mb
I'm testing a php script to create csv file which containing large amount of data.
This is the way I do this task:
$csvdata = "ID,FN,LN,ADDR,TEL,PRO\n
1,fn1,ln1,addr1,tel1,pro1\n...etc,";
$fname = "./temp/filename.csv";
$fp = fopen($fname,'w');
fwrite($fp,$csvdata);
fclose($fp);
I have notice that when the string ($csvdata) contain like 100,000 data rows the script work fine. But when it gets about more that 10,00,000 data rows it stop in the middle where I build the string $csvdata (I'm building $csvdata string by concatenating data in a for loop, data from database).
Could someone let me know what's went wrong when we use large string value?
Thank you
Kind regards
I'm building $csvdata string by concatenating data in a for loop, data from database.
From this I gather that you are preparing the entire file to be written as a string and finally writing it to the file.
If so, then this approach is not good. Remember that the string is present in the main memory and every time you append, you are consuming more memory. The fix for this is to write the file line by line that is read a record from the database, transform/format if you want to, prepare the CSV row and write it to the file.
Check out your error log. It wil probably say something about
you trying to allocate memory exceeding some maximum. This means you are using too much memory -> you can change the amount of memory php is allowed to use (memory limit in php.ini)
The execution time is longer then the allowed time. Again, you can increase this time.
It should be at the $csvdata= part where your script probably gives out a Memory Exhausted error.
When you save 10 chars to your a variable, it takes 10 bits, and its keeps getting bigger. And the limit is when it reaches the allocated memory or php.
So this is how you move on :
Set Memory Limit - Increase your PHP memory limit
`ini_set('memory_limit', '256M');
2.Write line by line
Write each piece of data to the file rightaway instead of piling them up. Also, If you write array[0] to the file, and then stores the next piece of data to array[1] and write again and continue this, it would be of the same effect of what you do now.
So either
while(blah blah){
$var = "text";
fwrite($file,$var);
or in a for loop
for($i=0;blahblah){
$var[$i] = "query";
fwrite($file,$var[$i]);
unset($var);
For loop comes in handy when the database queries are conditionals with WHERE id='$i'
Good luck.
At the moment I'm writing an import script for a very big CSV file. The Problem is most times it stops after a while because of an timeout or it throws an memory error.
My Idea was now to parse the CSV file in "100 lines" steps and after 100 lines recall the script automatically. I tried to achieve this with header (location ...) and pass the current line with get but it didn't work out as I want to.
Is there a better way to this or does someone have an idea how to get rid of the memory error and the timeout?
I've used fgetcsv to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.
// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
// http://data.worldbank.org/data-catalog/world-development-indicators
if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
{
// get the first row, which contains the column-titles (if necessary)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// resort/rewrite data and insert into DB here
// try to use conditions sparingly here, as those will cause slow-performance
// I don't know if this is really necessary, but it couldn't harm;
// see also: http://php.net/manual/en/features.gc.php
unset($data);
}
fclose($handle);
}
I find uploading the file and inserting using mysql's LOAD DATA LOCAL query a fast solution eg:
$sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv'
REPLACE INTO TABLE table_name FIELDS TERMINATED BY ','
ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
$result = $mysqli->query($sql);
If you don't care about how long it takes and how much memory it needs, you can simply increase the values for this script. Just add the following lines to the top of your script:
ini_set('memory_limit', '512M');
ini_set('max_execution_time', '180');
With the function memory_get_usage() you can find out how much memory your script needs to find a good value for the memory_limit.
You might also want to have a look at fgets() which allows you to read a file line by line. I am not sure if that takes less memory, but I really think this will work. But even in this case you have to increase the max_execution_time to a higher value.
There seems to be an enormous difference between fgetcsv() and fgets() when it comes to memory consumption.
A simple CSV with only one column passed my 512M memory limit for just 50000 records with fgetcsv() and took 8 minutes to report that.
With fgets() it took only 3 minutes to successfully process 649175 records, and my local server wasn't even gasping for additional air..
So my advice is to use fgets() if the number of columns in your csv is limited. In my case fgets() returned directly the string inside column 1.
For more then one column, you might use explode() in a disposable array which you unset() after each record operation.
Thumbed up answer 3 #ndkauboy
Oh. Just make this script called as CLI, not via silly web interface. So, no execution time limit will affect it.
And do not keep parsed results forever but write them down immediately - so, you won't be affected by memory limit either.
Am exporting data to csv. after 25000 records , memory exhausted.
Memory limit increasing is ok.
If i have 100000 rows, can i write it as 4 process.
write first 25000 rows, then next 25000 then next...
Is this possible in csv export?
Will this have any advantage? Or this is same exporting whole data?
Any multiple processing or parallel processing have some advantage?
Well, this depends on how you're generating the CSV.
Assuming that you're doing it as the result of a database query (or some other import), you could try streaming instead of building and then returning.
Basically, you turn off output buffering first:
while(ob_get_level() > 0) {
ob_end_flush();
}
Then, when you're building it, echo it out row by row:
foreach ($rows as $row) {
echo '"'.$row[0].'","'.$row[1].'"'."\n";
}
That way, you're not using too much memory in PHP.
You could also write the data to a temporary file, and then stream that file back:
$file = tmpfile();
foreach ($rows as $row) {
fputcsv($file, $row);
}
rewind($file);
fpassthru($file); // Sends the file to the client
fclose($file);
But again, it all depends on what you're doing. It sounds to me like you're building the CSV in a string (which is eating all your memory). That's why I suggested these two options...
The problem is if you fork the process, you have to worry about cleaning its children up, and you're still using the same amount of memory. Ultimately you're limited by the machine memory, but if you don't want to have to conditionally increase php's memory_limit based on the number of iterations, then forking may be the way to go.
If you compiled PHP with --enable-pcntl and --enable-sigchild, you're good to go - otherwise, you won't be able to fork the process. One workaround would be to have a master script that delegates the execution of other scripts, but if you're using backticks or shell() or exec() (or anything similar) it starts to get sloppy and you'll have to take a lot of steps to ensure that your commands cannot be tainted/exploited.