Run PHP for longer time in separate processes - php

I have a directory which can contain CSV files that come through a service that I need to import into database. These CSV files are 1000 rows each and can be 10 to 150 files.
I want to insert data of all these CSV files into database. The problem is that PHP dies because of timeout issue because even if I use set_time_limit(0), the server (siteground.com) imposes its restrictions. Here is the code:
// just in case even though console script should not have problem
ini_set('memory_limit', '-1');
ini_set('max_input_time', '-1');
ini_set('max_execution_time', '0');
set_time_limit(0);
ignore_user_abort(1);
///////////////////////////////////////////////////////////////////
function getRow()
{
$files = glob('someFolder/*.csv');
foreach ($files as $csvFile) {
$fh = fopen($csvFile, 'r');
$count = 0;
while ($row = fgetcsv($fh)) {
$count++;
// skip header
if ($count === 1) {
continue;
}
// make sure count of header and actual row is same
if (count($this->headerRow) !== count($row)) {
continue;
}
$rowWithHeader = array_combine($this->headerRow, $row);
yield $rowWithHeader;
}
}
}
foreach(getRow() as $row) {
// fix row
// now insert in database
}
This is actually a Command run through artisan (I am using Laravel). I know that CLI doesn't have time restrictions but for some reason not all CSV files get imported and process ends at certain point of time.
So my question is is there way to invoke separate PHP process for each CSV file present in a directory ? Or some other way of doing this so I am able to import all CSV files without any issue like PHP's generator, etc

You could just do some bash magic. refactor your script so that it processes one file only. The file to process is an argument to the script, access it by using $argv.
<?php
// just in case even though console script should not have problem
ini_set('memory_limit', '-1');
ini_set('max_input_time', '-1');
ini_set('max_execution_time', '0');
set_time_limit(0);
ignore_user_abort(1);
$file = $argv[1]; // file is the first and only argument to the script
///////////////////////////////////////////////////////////////////
function getRow($csvFile)
{
$fh = fopen($csvFile, 'r');
$count = 0;
while ($row = fgetcsv($fh)) {
$count++;
// skip header
if ($count === 1) {
continue;
}
// make sure count of header and actual row is same
if (count($this->headerRow) !== count($row)) {
continue;
}
$rowWithHeader = array_combine($this->headerRow, $row);
yield $rowWithHeader;
}
}
foreach(getRow($file) as $row) {
// fix row
// now insert in database
}
Now, call your script like this:
for file in `ls /path/to/folder | grep csv`; do php /path/to/your/script.php /path/to/folder/$file; done
This will execute your script for each .csv file in your /path/to/folder

The best approach is to process a limited number of files per one php process. For example, you can start with 10(calculate a number of files empirical) files, process them, mark as removed(move to a folder with processed file) and stop the process. After that start a new process to import another 10 files and so on. In Laravel you can say to not start more than one process for a specific command if another process is working already. The command for Laravel is below:
$schedule->command("your job")->everyMinute()->withoutOverlapping();
If you use this approach you can be sure that all files will be processed for specific time and they will not consume too much resources to be killed.

If your hosting providers allows cron jobs, they dont have a timeout limit.
Also they should fit the job better than manually calling the function for heavy and long tasks, since that could case huge problems if the method its called several times.

Related

Check if a file exist, report an OK if it does or an error if timeout

Following the good advice on this link:
How to keep checking for a file until it exists, then provide a link to it
The loop will never end if the file will never be created.
In a perfect system, it should not happen, but if it does how would one exit from that loop?
I have a similar case:
/* More codes above */
// writing on the file
$csvfile = $foldername.$date.$version.".csv";
$csv = fopen( $csvfile, 'w+' );
foreach ($_POST['lists'] as $pref) {
fputcsv($csv, $pref, ";");
}
// close and wait IO creation
fclose($csv);
sleep(1);
// Running the Java
$exec = shell_exec("/usr/bin/java -jar $app $csvfile");
sleep(3);
$xmlfile = preg_replace('/\\.[^.\\s]{3,4}$/', '.xml', $csvfile);
if (file_exists("$csvfile") && (file_exists("$xmlfile"))){
header("Location:index.php?msg");
exit;
}
else if (!file_exists("$csvfile")){
header("Location:index.php?msgf=".basename($csvfile)." creation failed!");
exit;
}
else if (!file_exists("$xmlfile")){
header("Location:index.php?msgf=".basename($xmlfile)." creation failed!");
exit;
}
//exit;
} // Just the end
?>
( Yes, bad idea to pass variables in the url.. I got that covered )
I use sleep(N); because I know the java takes short to create the file, same for the csv on the php.
How can I improve the check on the file, to wait the necessary time before reporting the status OK or NOT ok if the file was not created?
After reading your comments, I think "the best loop" isn't a good question to get a better answer.
The linked script just give a good approach when the script expects a file. That script will wait until the file is created or forever (but the creator ensures about the file creation).
Better than that, you could give a particular period to ensure if the file exists or not.
If after the shell_exec the java script didn't create the file (which I think is almost impossible, but is just a thought), you could use a code like above:
$cycles = 0;
while (!($isFileCreated = file_exists($filename)) && $cycles > 1000) {
$cycles++;
usleep(1);
}
if (!$isFileCreated)
{
//some action
//throw new RuntimeException("File doesn't exists");
}
//another action
The script above will wait until the file is created or until reach a particular amount of cycles (it's better to call cycles than microseconds, because I can't ensure that each cycle will be execute in one microsecond). The number of cycles can be changed if you need more time.

Big CSV exportation blocks user session with PHP

I'm trying to export a lot of data trough a CSV export. The amount of data it's really big, around 100.000 records and counting.
My client usually uses two tabs to browse and check several stuff at the same time. So a requirement is that while the export is being made, he can continues browsing the system.
The issue is that when the CSV is being generated on the server, the session is blocked, you cannot load another page until the generation is completed.
This is what I'm doing:
Open the file
Loop trough the amount of data(One query per cycle, each cycle queries 5000 records) pd: I cannot change this, because of certain limitations.
write the data into the file
free memory
close the file
set headers to begin download
During the entire process, it's not possible to navigate the site in another tab.
The block of code:
$temp = 1;
$first = true;
$fileName = 'csv_data_' . date("Y-m-d") . '-' . time() . '.csv';
$filePath = CSV_EXPORT_PATH . $fileName;
// create CSV file
$fp = fopen($filePath, 'a');
// get data
for ($i = 1; $i <= $temp; $i++) {
// get lines
$data = $oPB->getData(ROWS_PER_CYCLE, $i); // ROWS_PER_CYCLE = 5000
// if something is empty, exit
if (empty($data)) {
break;
}
// write the data that will be exported into a file
fwrite($fp, $export->arrayToCsv($data, '', '', $first));
// count element
$temp = ceil($data[0]->foundRows / ROWS_PER_CYCLE); // foundRows is always the same value, doesn't change per query.
$first = false; // hide header for next rows
// free memory
unset($lines);
}
// close file
fclose($fp);
/**
* Begin Download
*/
$export->csvDownload($filePath); // set headers
Some considerations:
The count is being made in the same query, but it's not entering into an infinite loop, works as expected. It's contained into $data[0]->foundRows, and avoids an unnecesary query to count all the available records.
There're several memory limitations due to environment settings, that I cannot change.
Does anyone know How can I improve this? Or any other solution.
Thanks for reading.
I'm replying only because it can be helpful to someone else. A colleague came up with a solution for this problem.
Call the function session_write_close() before
$temp = 1;
Doing this, you're ending the current session and storing the session data, so I'm being able to download the file a continue navigating in other tabs.
I hope it helps some one.
Some considerations about this solution:
You must no require to use session data after session_write_close()
The export script is in another file. For ex: home.php calls trough a link export.php

PHP, check if the file is being written to/updated by PHP script?

I have a script that re-writes a file every few hours. This file is inserted into end users html, via php include.
How can I check if my script, at this exact moment, is working (e.g. re-writing) the file when it is being called to user for display? Is it even an issue, in terms of what will happen if they access the file at the same time, what are the odds and will the user just have to wait untill the script is finished its work?
Thanks in advance!
More on the subject...
Is this a way forward using file_put_contents and LOCK_EX?
when script saves its data every now and then
file_put_contents($content,"text", LOCK_EX);
and when user opens the page
if (file_exists("text")) {
function include_file() {
$file = fopen("text", "r");
if (flock($file, LOCK_EX)) {
include_file();
}
else {
echo file_get_contents("text");
}
}
} else {
echo 'no such file';
}
Could anyone advice me on the syntax, is this a proper way to call include_file() after condition and how can I limit a number of such calls?
I guess this solution is also good, except same call to include_file(), would it even work?
function include_file() {
$time = time();
$file = filectime("text");
if ($file + 1 < $time) {
echo "good to read";
} else {
echo "have to wait";
include_file();
}
}
To check if the file is currently being written, you can use filectime() function to get the actual time the file is being written.
You can get current timestamp on top of your script in a variable and whenever you need to access the file, you can compare the current timestamp with the filectime() of that file, if file creation time is latest then the scenario occured when you have to wait for that file to be written and you can log that in database or another file.
To prevent this scenario from happening, you can change the script which is writing the file so that, it first creates temporary file and once it's done you just replace (move or rename) the temporary file with original file, this action would require very less time compared to file writing and make the scenario occurrence very rare possibility.
Even if read and replace operation occurs simultaneously, the time the read script has to wait will be very less.
Depending on the size of the file, this might be an issue of concurrency. But you might solve that quite easy: before starting to write the file, you might create a kind of "lock file", i.e. if your file is named "incfile.php" you might create an "incfile.php.lock". Once you're doen with writing, you will remove this file.
On the include side, you can check for the existance of the "incfile.php.lock" and wait until it's disappeared, need some looping and sleeping in the unlikely case of a concurrent access.
Basically, you should consider another solution by just writing the data which is rendered in to that file to a database (locks etc are available) and render that in a module which then gets included in your page. Solutions like yours are hardly to maintain on the long run ...
This question is old, but I add this answer because the other answers have no code.
function write_to_file(string $fp, string $string) : bool {
$timestamp_before_fwrite = date("U");
$stream = fopen($fp, "w");
fwrite($stream, $string);
while(is_resource($stream)) {
fclose($stream);
}
$file_last_changed = filemtime($fp);
if ($file_last_changed < $timestamp_before_fwrite) {
//File not changed code
return false;
}
return true;
}
This is the function I use to write to file, it first gets the current timestamp before making changes to the file, and then I compare the timestamp to the last time the file was changed.

How to parse Large CSV file without timing out?

I'm trying to parse a 50 megabyte .csv file. The file itself is fine, but I'm trying to get past the massive timeout issues involved. Every is set upload wise, I can easily upload and re-open the file but after the browser timeout, I receive a 500 Internal error.
My guess is I can save the file onto the server, open it and keep a session value of what line I dealt with. After a certain line I reset the connect via refresh and open the file at the line I left off with. Is this a do-able idea? The previous developer made a very inefficient MySQL class and it controls the entire site, so I don't want to write my own class if I don't have to, and I don't want to mess with his class.
TL;DR version: Is it efficient to save the last line I'm currently on of a CSV file that has 38K lines of products then, and after X number of rows, reset the connection and start from where I left off? Or is there another way to parse a Large CSV file without timeouts?
NOTE: It's the PHP script execution time. Currently at 38K lines, it takes about 46 minutes and 5 seconds to run via command line. It works correctly 100% of the time when I remove it from the browser, suggesting that it is a browser timeout. Chrome's timeout is not editable as far as Google has told me, and Firefox's timeout works rarely.
You could do something like this:
<?php
namespace database;
class importcsv
{
private $crud;
public function __construct($dbh, $table)
{
$this->crud = new \database\crud($dbh, $table);
return $this;
}
public function import($columnNames, $csv, $seperator)
{
$lines = explode("\n", $csv);
foreach($lines as $line)
{
\set_time_limit(30);
$line = explode($seperator, $line);
$data = new \stdClass();
foreach($line as $i => $item)
{
if(isset($columnNames[$i])&&!empty($columnNames[$i]))
$data->$columnNames[$i] = $item;
}
#$x++;
$this->crud->create($data);
}
return $x;
}
public function importFile($columnNames, $csvPath, $seperator)
{
if(file_exists($csvPath))
{
$content = file_get_contents($csvPath);
return $this->import($columnNames, $content, $seperator);
}
else
{
// Error
}
}
}
TL;DR: \set_time_limit(30); everytime you loop throu a line might fix your timeout issues.
I suggest to run php from command line and set it as a cron job. This way you don't have to modify your code. There will be no timeout issue and you can easily parse large CSV files.
also check this link
Your post is a little unclear due to the typos and grammar, could you please edit?
If you are saying that the Upload itself is okay, but the delay is in processing of the file, then the easiest thing to do is to parse the file in parallel using multiple threads. You can use the java built-in Executor class, or Quartz or Jetlang to do this.
Find the size of the file or number of lines.
Select a Thread load (Say 1000 lines per thread)
Start an Executor
Read the file in a loop.
For ach 1000 lines, create a Runnable and load it to the Executor
Start the Executor
Wait till all threads are finished
Each runnable does this:
Fetch a connection
Insert the 1000 lines
Log the results
Close the connection

Deleting File Section in PHP?

I'm a beginner at PHP, and I'm still trying to work out proper file handling techniques. I'm usually alright with trial and error, but when it comes to deleting and modifying data, I always like to be on the safe side.
I wrote the code below to delete a certain section of a file, but I'm not sure if it will work with larger files or under unforeseen conditions which require experience to code for.
I tested this just now and it did work, but I would like to run it by the more experienced programmers first:
function deletesection($start,$len){
$pos=0;
$tmpname=$this->name."tmp.tmp";
$tmpf=fopen($tmpname,"wb+");
rewind($tmpf);
$h=fopen($this->name,"rb");
rewind($h);
while(!feof($h)){
$this->xseek($h,$pos);
$endpos = $pos+1000;
if($endpos>$start && $pos<$start+$len){
$readlen=$start-$pos;
$nextpos=$start+$len;
}
else{
$readlen=1000;
$nextpos=$pos+1000;
}
fwrite($tmpf,fread($h,$readlen));
$pos=$nextpos;
}
fclose($h);
unlink($this->name);
rename($tmpname,$this->name);
}
This is inside a class where the property "name" is the file path.
I'm writing the file 1000 bytes at a time because I was getting errors about the maximum amount of memory being exceeded when testing with files over 30mb.
I had a quick look at your code - seems a bit complicated, also copying the entire file will be less efficient if the section to delete is small in relation to the total filesize...
function deletesection($filename, $start, $len)
{
$chunk=49128;
if (!is_readable($filename) || !is_writeable($filename) || !is_file($filename)) {
return false;
}
$tfile=tempnam(); // used to hold stuff after the section to delete
$oh=fopen($tfile, 'wb');
$ih=fopen($filename, 'rb');
if (fseek($ih, $start+$len)) {
while ($data=fgets($ih, $chunk) && !feof($ih) {
fputs($oh,$data);
}
fclose($oh); $oh=fopen($tfile, 'rb');
// or could just have opened it r+b to begin with
fseek($ih, $start, SEEK_SET);
while ($data=fgets($oh, $chunk) && !feof($oh) {
fputs($ih, $data);
}
}
fclose($oh);
fclose($ih);
unlink($tfile);
return true;
}
I believe it would also be possible to do this modifying the file in place (i.e. not using a second file) using a single file handle - but the code would get a bit messy and would require lots of seeks (then an ftruncate).
NB using files for managing data with PHP (and most other languages in a multi-user context) is not a good idea.

Categories