php parallelprocess or one process after another on filewriting - php

Am exporting data to csv. after 25000 records , memory exhausted.
Memory limit increasing is ok.
If i have 100000 rows, can i write it as 4 process.
write first 25000 rows, then next 25000 then next...
Is this possible in csv export?
Will this have any advantage? Or this is same exporting whole data?
Any multiple processing or parallel processing have some advantage?

Well, this depends on how you're generating the CSV.
Assuming that you're doing it as the result of a database query (or some other import), you could try streaming instead of building and then returning.
Basically, you turn off output buffering first:
while(ob_get_level() > 0) {
ob_end_flush();
}
Then, when you're building it, echo it out row by row:
foreach ($rows as $row) {
echo '"'.$row[0].'","'.$row[1].'"'."\n";
}
That way, you're not using too much memory in PHP.
You could also write the data to a temporary file, and then stream that file back:
$file = tmpfile();
foreach ($rows as $row) {
fputcsv($file, $row);
}
rewind($file);
fpassthru($file); // Sends the file to the client
fclose($file);
But again, it all depends on what you're doing. It sounds to me like you're building the CSV in a string (which is eating all your memory). That's why I suggested these two options...

The problem is if you fork the process, you have to worry about cleaning its children up, and you're still using the same amount of memory. Ultimately you're limited by the machine memory, but if you don't want to have to conditionally increase php's memory_limit based on the number of iterations, then forking may be the way to go.
If you compiled PHP with --enable-pcntl and --enable-sigchild, you're good to go - otherwise, you won't be able to fork the process. One workaround would be to have a master script that delegates the execution of other scripts, but if you're using backticks or shell() or exec() (or anything similar) it starts to get sloppy and you'll have to take a lot of steps to ensure that your commands cannot be tainted/exploited.

Related

PHP - How to append to a JSON file

I generate JSON files which I load into datatables, and these JSON files can contain thousands of rows from my database. To generate them, I need to loop through every row in the database and add each database row as a new row in the JSON file. The problem I'm running into is this:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 262643 bytes)
What I'm doing is I get the JSON file with file_get_contents($json_file) and decode it into an array then I add a new row to the array, then encode the array back into JSON and export it to the file with file_put_contents($json_file).
Is there a better way to do this? Is there a way I can prevent the memory increasing with each loop iteration? Or is there a way I can clear the memory before it reaches the limit? I need the script to run to completion, but with this memory problem it barely gets up to 5% completion before crashing.
I can keep rerunning the script and each time I rerun it, it adds more rows to the JSON file, so if this memory problem is unavoidable, is there a way to automatically rerun the script numerous times until its finished? For example could I detect the memory usage, and detect when its about to reach the limit, then exit out of the script and restart it? I'm on wpengine so they won't allow security risky functions like exec().
So I switched to using CSV files and it solved the memory problem. The script runs vastly faster too. JQuery DataTables doesn't have built in support for CSV files, so I wrote a function to convert the CSV file to JSON:
public function csv_to_json($post_type) {
$data = array(
"recordsTotal" => $this->num_rows,
"recordsFiltered" => $this->num_rows,
"data"=>array()
);
if (($handle = fopen($this->csv_file, 'r')) === false) {
die('Error opening file');
}
$headers = fgetcsv($handle, 1024, "\t");
$complete = array();
while ($row = fgetcsv($handle, 1024, "\t")) {
$complete[] = array_combine($headers, $row);
}
fclose($handle);
$data['data'] = $complete;
file_put_contents($this->json_file,json_encode($data,JSON_PRETTY_PRINT));
}
So the result is I create a CSV file and a JSON file much faster than creating a JSON file alone, and there are no issues with memory limits.
Personally as I said in the comments, I would use CSV files. They have several advantages.
you can read / write one line at a time so you only manage the memory for one line
you can just append new data into the file.
PHP has plenty of built in support using either the fputcsv() or SPL file objects.
you can load them directly into the database using using "Load Data Infile"
http://dev.mysql.com/doc/refman/5.7/en/load-data.html
The only cons are
keep the same schema through the whole file
no nested data structures
The issue with Json, is ( as far as I know ) you have to keep the whole thing in memory as a single data set. Therefor you cannot stream it ( line for line ) like a normal text file. There is really no solution beside limiting the size of the json data, which may or may not even be easy to do. You can increase the memory some, but that is just a temporary fix if you expect the data to continue to grow.
We use CSV files in a production environment and I regularly deal with datasets that are 800k or 1M rows. I've even seen one that was 10M rows. We have a single table of 60M rows ( MySql ) that is populated from CSV uploads. So it will work and be robust.
If your set on Json, then I would just come up with a fixed number of rows that works and design your code to only run that many rows at a time. It's impossible for me to guess how to do that without more details.

PHP Using fgetcsv on a huge csv file

Using fgetcsv, can I somehow do a destructive read where rows I've read and processed would be discarded so if I don't make it through the whole file in the first pass, I can come back and pick up where I left off before the script timed out?
Additional Details:
I'm getting a daily product feed from a vendor that comes across as a 200mb .gz file. When I unpack the file, it turns into a 1.5gb .csv with nearly 500,000 rows and 20 - 25 fields. I need to read this information into a MySQL db, ideally with PHP so I can schedule a CRON to run the script at my web hosting provider every day.
I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.
My idea was to grab the information from the .csv using the fgetcsv function, but I'm expecting to have to take multiple passes at the file because of the 3 minute timeout, I was thinking it would be nice to whittle away at the file as I process it so I wouldn't need to spend cycles skipping over rows that were already processed in a previous pass.
From your problem description it really sounds like you need to switch hosts. Processing a 2 GB file with a hard time limit is not a very constructive environment. Having said that, deleting read lines from the file is even less constructive, since you would have to rewrite the entire 2 GB to disk minus the part you have already read, which is incredibly expensive.
Assuming you save how many rows you have already processed, you can skip rows like this:
$alreadyProcessed = 42; // for example
$i = 0;
while ($row = fgetcsv($fileHandle)) {
if ($i++ < $alreadyProcessed) {
continue;
}
...
}
However, this means you're reading the entire 2 GB file from the beginning each time you go through it, which in itself already takes a while and you'll be able to process fewer and fewer rows each time you start again.
The best solution here is to remember the current position of the file pointer, for which ftell is the function you're looking for:
$lastPosition = file_get_contents('last_position.txt');
$fh = fopen('my.csv', 'r');
fseek($fh, $lastPosition);
while ($row = fgetcsv($fh)) {
...
file_put_contents('last_position.txt', ftell($fh));
}
This allows you to jump right back to the last position you were at and continue reading. You obviously want to add a lot of error handling here, so you're never in an inconsistent state no matter which point your script is interrupted at.
You can avoid timeout and memory error to some extent when reading like a Stream. By Reading line by line and then inserts each line into a database (Or Process accordingly). In that way only single line is hold in memory on each iteration. Please note don't try to load a huge csv-file into an array, that really would consume a lot of memory.
if(($handle = fopen("yourHugeCSV.csv", 'r')) !== false)
{
// Get the first row (Header)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// Process Your Data
unset($data);
}
fclose($handle);
}
I think a better solution (it will be phenomnally inefficient to continuously rewind and write to open file stream) would be to track the file position of each record read (using ftell) and store it with the data you've read - then if you have to resume, then just fseek to the last position.
You could try loading the file directly using mysql's read file function (which will likely be a lot faster) although I've had problems with this in the past and ended up writing my own php code.
I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.
What have you tried?
The memory can be limited by other means than the php.ini file, but I can't imagine how anyone could actually prevent you from using a different execution time (even if ini_set is disabled, from the command line you could run php -d max_execution_time=3000 /your/script.php or php -c /path/to/custom/inifile /your/script.php )
Unless you are trying to fit the entire datafile into memory then there should be no issue with a memory limit of 128Mb

Export big mysql table into .xls format

I need exposrt mysql table to .xls format, this is fragment from my code
$result = mysql_query( /* here query */ );
$objPHPExcel = new PHPExcel();
$rowNumber = 1;
while ($row = mysql_fetch_row($result)) {
$col = 'A';
foreach($row as $cell) {
$objPHPExcel->getActiveSheet()->setCellValue($col.$rowNumber,$cell);
$col++;
}
$rowNumber++;
}
Problem is that, in table is 500 000 rows and in while cycle at every iteration when I make also foreach cycle, this takes very many time at php file execution.
Possible to optimize this code?
500,000 rows will always take a lot of time to write.... even if you speed it up by using the worksheet's fromArray() method to get rid of your foreach loop; and (as nichar has pointed out) this is too many rows for the xls format to handle unless you split them across multiple worksheets.
You can reduce the memory requirements by enabling cell caching (SQLite gives the best memory usage), but it will still take a long time to execute for 500,000 rows and anything this size should be run as a batch/cron job
This is a point to note rather than a direct answer to your question - but if the Excel file format you're outputting is .xls, the maximum rows would be 65,536 and if it is MS Excel 2007+ format e.g .xlsx, the maximum rows would be 1,048,576.
So without changing the output format to .xlsx (which is an entirely different structure), the files will be too large to open.
Consider dumping the data into a csv file, and then importing it into Excel. Should be a lot faster.
If you get a php timeout, you can reset the limit by adding this inside the while or for loop:
set_time_limit(300); //whatever seconds you want
If you're running it through the browser, your server may be timing out. I recommend you run it on command line to avoid this.
Also, similar to what nickhar mentioned, it can be an excel issue. I would try outputting as a csv file. I think it will allow you to output more lines.

PHP script stops working when handling large size of string

I'm testing a php script to create csv file which containing large amount of data.
This is the way I do this task:
$csvdata = "ID,FN,LN,ADDR,TEL,PRO\n
1,fn1,ln1,addr1,tel1,pro1\n...etc,";
$fname = "./temp/filename.csv";
$fp = fopen($fname,'w');
fwrite($fp,$csvdata);
fclose($fp);
I have notice that when the string ($csvdata) contain like 100,000 data rows the script work fine. But when it gets about more that 10,00,000 data rows it stop in the middle where I build the string $csvdata (I'm building $csvdata string by concatenating data in a for loop, data from database).
Could someone let me know what's went wrong when we use large string value?
Thank you
Kind regards
I'm building $csvdata string by concatenating data in a for loop, data from database.
From this I gather that you are preparing the entire file to be written as a string and finally writing it to the file.
If so, then this approach is not good. Remember that the string is present in the main memory and every time you append, you are consuming more memory. The fix for this is to write the file line by line that is read a record from the database, transform/format if you want to, prepare the CSV row and write it to the file.
Check out your error log. It wil probably say something about
you trying to allocate memory exceeding some maximum. This means you are using too much memory -> you can change the amount of memory php is allowed to use (memory limit in php.ini)
The execution time is longer then the allowed time. Again, you can increase this time.
It should be at the $csvdata= part where your script probably gives out a Memory Exhausted error.
When you save 10 chars to your a variable, it takes 10 bits, and its keeps getting bigger. And the limit is when it reaches the allocated memory or php.
So this is how you move on :
Set Memory Limit - Increase your PHP memory limit
`ini_set('memory_limit', '256M');
2.Write line by line
Write each piece of data to the file rightaway instead of piling them up. Also, If you write array[0] to the file, and then stores the next piece of data to array[1] and write again and continue this, it would be of the same effect of what you do now.
So either
while(blah blah){
$var = "text";
fwrite($file,$var);
or in a for loop
for($i=0;blahblah){
$var[$i] = "query";
fwrite($file,$var[$i]);
unset($var);
For loop comes in handy when the database queries are conditionals with WHERE id='$i'
Good luck.

How to write to file in large php application(multiple questions)

What is the best way to write to files in a large php application. Lets say there are lots of writes needed per second. How is the best way to go about this.
Could I just open the file and append the data. Or should i open, lock, write and unlock.
What will happen of the file is worked on and other data needs to be written. Will this activity be lost, or will this be saved. and if this will be saved will is halt the application.
If you have been, thank you for reading!
Here's a simple example that highlights the danger of simultaneous wites:
<?php
for($i = 0; $i < 100; $i++) {
$pid = pcntl_fork();
//only spawn more children if we're not a child ourselves
if(!$pid)
break;
}
$fh = fopen('test.txt', 'a');
//The following is a simple attempt to get multiple threads to start at the same time.
$until = round(ceil(time() / 10.0) * 10);
echo "Sleeping until $until\n";
time_sleep_until($until);
$myPid = posix_getpid();
//create a line starting with pid, followed by 10,000 copies of
//a "random" char based on pid.
$line = $myPid . str_repeat(chr(ord('A')+$myPid%25), 10000) . "\n";
for($i = 0; $i < 1; $i++) {
fwrite($fh, $line);
}
fclose($fh);
echo "done\n";
If appends were safe, you should get a file with 100 lines, all of which roughly 10,000 chars long, and beginning with an integer. And sometimes, when you run this script, that's exactly what you'll get. Sometimes, a few appends will conflict, and it'll get mangled, however.
You can find corrupted lines with grep '^[^0-9]' test.txt
This is because file append is only atomic if:
You make a single fwrite() call
and that fwrite() is smaller than PIPE_BUF (somewhere around 1-4k)
and you write to a fully POSIX-compliant filesystem
If you make more than a single call to fwrite during your log append, or you write more than about 4k, all bets are off.
Now, as to whether or not this matters: are you okay with having a few corrupt lines in your log under heavy load? Honestly, most of the time this is perfectly acceptable, and you can avoid the overhead of file locking.
I do have high-performance, multi-threaded application, where all threads are writing (appending) to single log file. So-far did not notice any problems with that, each thread writes multiple times per second and nothing gets lost. I think just appending to huge file should be no issue. But if you want to modify already existing content, especially with concurrency - I would go with locking, otherwise big mess can happen...
If concurrency is an issue, you should really be using databases.
If you're just writing logs, maybe you have to take a look in syslog function, since syslog provides an api.
You should also delegate writes to a dedicated backend and do the job in an asynchroneous maneer ?
These are my 2p.
Unless a unique file is needed for a specific reason, I would avoid appending everything to a huge file. Instead, I would wrap the file by time and dimension. A couple of configuration parameters (wrap_time and wrap_size) could be defined for this.
Also, I would probably introduce some buffering to avoid waiting the write operation to be completed.
Probably PHP is not the most adapted language for this kind of operations, but it could still be possible.
Use flock()
See this question
If you just need to append data, PHP should be fine with that as filesystem should take care of simultaneous appends.

Categories