I have the following question: how can I run a php script only once? Before people start to reply that this is indeed a similar or duplicate question, please continue reading...
The situation is as follows, I'm currently writing my own MVC Framework and I've come up with a module based system so I can easily add new functionality to my framework. In order to do so, I created a /ROOT/modules directory in which one could add the new modules.
So as you can imagine, the script needs to read the directory, read all the php files, parse them and then is able to execute the new functionality, however it has to do this for all the webbrowsers requests. This would make this task about O(nAmountOfRequests * nAmountOfModules) which is rather big on websites with a large amount of user requests every second.
Then I figured, what if I would introduce a session variable like: $_SESSION['modulesLoaded'] and then simply check if its set or not. This would reduce the load to O(nUniqueAmountOfRequests * nAmountOfModules) but this is still a large Big O if the only thing I want to do is read the directory once.
What I have now is the following:
/** Load the modules */
require_once(ROOT . DIRECTORY_SEPARATOR . 'modules' . DIRECTORY_SEPARATOR . 'module_bootloader.php');
Which exists of the following code:
<?php
//TODO: Make sure that the foreach only executes once for all the requests instead of every request.
if (!array_key_exists('modulesLoaded', $_SESSION)) {
foreach (glob('*.php') as $module) {
require_once($module);
}
$_SESSION['modulesLoaded'] = '1';
}
So now the question, is there a solution, like a superglobal variable, that I can access and exists for all requests, so instead of the previous Big Os, I can make a Big O thats only exists of nAmountOfModules? Or is there another way of easily reading the module files only once?
Something like:
if(isFirstRequest){
foreach (glob('*.php') as $module) {
require_once($module);
}
}
At the most basic form, if you want to run it once, and only once (per installation, not per user), have your intensive script change something on the server state (add a file, change a file, change a record in a database), then check against that every time a request to run it is issued.
If you find a match, it would mean the script was already run, and you can continue with the process without having to run it again.
when called, lock the file, at the end of the script, delete the file. only called once. and as so not needed any longer, vanished in nirvana.
This naturally works the other way round, too:
<?php
$checkfile = __DIR__ . '/.checkfile';
clearstatcache(false, $checkfile);
if (is_file($checkfile)) {
return; // script did run already
}
touch($checkfile);
// run the rest of your script.
Just cache the array() to a file and, when you upload new modules, just delete the file. It will have to recreate itself and then you're all set again.
// If $cache file does not exist or unserialize fails, rebuild it and save it
if(!is_file($cache) or (($cached = unserialize(file_get_contents($cache))) === false)){
// rebuild your array here into $cached
$cached = call_user_func(function(){
// rebuild your array here and return it
});
// store the $cached data into the $cache file
file_put_contents($cache, $cached, LOCK_EX);
}
// Now you have $cached file that holds your $cached data
// Keep using the $cached variable now as it should hold your data
This should do it.
PS: I'm currently rewriting my own framework and do the same thing to store such data. You could also use a SQLite DB to store all such data your framework needs but make sure to test performance and see if it fits your needs. With proper indexes, SQLite is fast.
Related
I have a function that generates a table with contents from the DB. Some cells have custom HTML which I'm reading in with file_get_contents through a templating system.
The small content is the same but this action is performed maybe 15 times (I have a limit of 15 table rows per page). So does file_get_contents cache if it sees that the content is the same?
file_get_contents() does not have caching mechanism. However, you can use write your own caching mechanism.
Here is a draft :
$cache_file = 'content.cache';
if(file_exists($cache_file)) {
if(time() - filemtime($cache_file) > 86400) {
// too old , re-fetch
$cache = file_get_contents('YOUR FILE SOURCE');
file_put_contents($cache_file, $cache);
} else {
// cache is still fresh
}
} else {
// no cache, create one
$cache = file_get_contents('YOUR FILE SOURCE');
file_put_contents($cache_file, $cache);
}
UPDATE the previous if case is incorrect, now rectified by comparing to current time. Thanks #Arrakeen.
Like #deceze says, generally the answer is no. However operating system level caches may cache recently used files to make for quicker access, but I wouldn't count on those being available. If you'd like to cache a file that is being read multiple times per request, consider using a static variable to act as a cache inside a wrapper function.
function my_file_read($filename) {
static $file_contents = array();
if (!isset($file_contents[$filename])) {
$file_contents[$filename] = file_get_contents($filename);
}
return $file_contents[$filename];
}
Calling my_file_read($filename) multiple times will only read the file from disk a single time, subsequent calls will read the value from the static variable within the function. Note that you shouldn't count on this approach for large files or ones used only once per page, since the memory used by the static variable will persist until the end of the request. Keeping the contents of files unnecessarily in static variables is a good way to make your script a memory hog.
The correct answer is yes. All the PHP file system functions do their own caching, and you can use the "realpath_cache_size = 0" directive in PHP.ini to disable the caching if you like. The default caching timeout is 120 seconds. This is separate from the caching typically done by browsers for all GET requests (the majority of Web accesses) unless the HTTP headers override it. Caching is not a good idea during development work, since your code may read in old data from a file whose contents you have changed.
This is the code I'm using as I work my way to a solution.
public function indexAction()
{
//id3 options
$options = array("version" => 3.0, "encoding" => Zend_Media_Id3_Encoding::ISO88591, "compat" => true);
//path to collection
$path = APPLICATION_PATH . '/../public/Media/Music/';//Currently Approx 2000 files
//inner iterator
$dir = new RecursiveDirectoryIterator($path, RecursiveDirectoryIterator::SKIP_DOTS);
//iterator
$iterator = new RecursiveIteratorIterator($dir, RecursiveIteratorIterator::SELF_FIRST);
foreach ($iterator as $file) {
if (!$file->isDir() && $file->getExtension() === 'mp3') {
//real path to mp3 file
$filePath = $file->getRealPath();
Zend_Debug::dump($filePath);//current results: accepted path no errors
$id3 = new Zend_Media_Id3v2($filePath, $options);
foreach ($id3->getFramesByIdentifier("T*") as $frame) {
$data[$frame->identifier] = $frame->text;
}
Zend_Debug::dump($data);//currently can scan the whole collection without timing out, but APIC data not being processed.
}
}
}
The problem: Process a file system of mp3 files in multiple directories. Extract id3 tag data to a database (3 tables) and extract the cover image from the tag to a separate file.
I can handle the actual extraction and data handling. My issue is with output.
With the way that Zend Framework 1.x handles output buffering, outputting an indicator that the files are being processed is difficult. In an old style PHP script, without output buffering, you could print out a bit of html with every iteration of the loop and have some indication of progress.
I would like to be able to process each album's directory, output the results and then continue on to the next album's directory. Only requiring user intervention on certain errors.
Any help would be appreciated.
Javascript is not the solution I'm looking for. I feel that this should be possible within the constructs of PHP and a ZF 1 MVC.
I'm doing this mostly for my own enlightenment, it seems a very good way to learn some important concepts.
[EDIT]
Ok, how about some ideas on how to break this down into smaller chunks. Process one chunk, commit, process next chunk, kind of thing. In or out of ZF.
[EDIT]
I'm beginning to see the problem with what I'm trying to accomplish. It seems that output buffering is not just happening in ZF, it's happening everywhere from ZF all the way to the browser. Hmmmmm...
Introduction
This is a typical example of what you should not do because
You are trying to parse ID3 tag with PHP which is slow and trying to have multiple parse files at once would definitely make it even slower
RecursiveDirectoryIterator would load all the files in a folder and sub folder from what i see there is no limit .. it can be 2,000 today the 100,000 the next day ? Total processing time is unpredictable and this can definitely take some hours in some cases
High dependence on single file system, with your current architecture the files are stored in local system so it would be difficult to split the files and do proper load balancing
You are not checking if the file information has been extracted before and this results Loop and extraction Duplication
No locking system .. this means that this process can be initiated simultaneously resulting to general slow performance on the server
Solution 1 : With Current Architecture
My advice is not to use loop or RecursiveDirectoryIterator to process the files in bulk.
Target the file as soon as they are uploaded or transferred to the server. That way you are only working with one file at a time this way to can spread the processing time.
Solution 2: Job Queue (Proposed Solution)
Your problem is exactly what Job Queue are designed to do you are also not limited to implementing the parsing with PHP .. you take advantage of C or C++ for performance
Advantage
Transfer Jobs to other machines or processes that are better suited to do the work
It allows you to do work in parallel, to load balance processing
Reduce the latency of page views in high-volume web applications by running time-consuming tasks asynchronously
Multiple Languages client in PHP sever in C
Examples have tested
ZemoMQ
Gearman
Beanstalkd
Expected Process Client
Connect To Job Queue eg German
Connect to Database eg MongoDB or Redis
Loop with folder path
Check File extension
If file is mp3 , generate file hash eg. sha1_file
Check if file has been sent for processing
send hash, file to Job Server
Expected Process Server
Connect To Job Queue eg German
Connect to Database eg MongoDB or Redis
Receive hash / file
Extract ID3 tag ;
Update DB with ID3 Tag Information
Finally this processing can be done on multiple servers in parallel
One solution would be to use a Job Queue, such a Gearman. Gearman is an excellent solution for this kind of problem, and easily integrated with Zend Framework (http://blog.digitalstruct.com/2010/10/17/integrating-gearman-into-zend-framework/)
It will allow you to create a worker to process each "chuck", allowing your process to continue unblocked while the job is processed, very handy for long running proceeses such as music/image processing etc http://gearman.org/index.php?id=getting_started
I'm not familiar with how Zend Framework work. I will give you a general advice. When working with process that is doing so many iterative and possibly in long time, it is generally advised that the long process be moved into background process. Or, in web related, moved into cron job.
If the process you want to use is for single site, you can implement something like this, in your cronjob (note: rough pseudo-code):
<?php
$targetdir = "/path/to/mp3";
$logdir = "/path/to/log/";
//check if current state is exists. If it does, then previous cronjob is still running
//we should stop this process so that it doesn't do duplicated process which might have introduced random bugs
if(file_exists($logdir."current-state")){
exit;
}
//start process, write state to logdir
file_put_contents($logdir."current-log", "process started at ".date("Y-m-d H:i:s"));
file_put_contents($logdir."current-state", "started\t".date("Y-m-d H:i:s"));
$dirh = opendir($targetdir);
while($file = readdir($dirh)){
//lets ignore current and parent dir
if(in_array($file, array('.', '..'))) continue;
//do whatever process you want to do here:
//you might want to write another log, too:
file_put_contents($logdir."current-log", "processing file {$file}", FILE_APPEND);
}
closedir($dirh);
file_put_contents($logdir."current-log", "process finished at ".date("Y-m-d H:i:s"));
//process is finished, delete current-state:
unlink($logdir."current-state");
Next, in your php file for web, you can add snippet to, says admin page, or footer, or whatever page you want, to see the progress:
<?php
if(file_exists($logdir."current-state")){
echo "<strong>there are background process running.</strong>";
} else {
echo "<strong>no background process running.</strong>";
}
I should suggest using plugin.
class Postpone extends Zend_Controller_Plugin_Abstract
{
private $tail;
private $callback;
function __construct ($callback = array())
{
$this->callback = $callback;
}
public function setRequest (Zend_Controller_Request_Abstract $request)
{
/*
* We use layout, which essentially contains some html and a placeholder for action output.
* We put the marker into this placeholder in order to figure out "the tail" -- the part of layout that goes after placeholder.
*/
$mark = '---cut-here--';
$layout = $this->getLayout ();
$layout->content = $mark;
/*
* Now we have it.
*/
$this->tail = preg_replace ("/.*$mark/s", '', $layout->render ());
}
public function postDispatch (Zend_Controller_Request_Abstract $request)
{
$response = $this->getResponse ();
$response->sendHeaders ();
/*
* The layout generates its output to the default section of the response.
* This output inludes "the tail".
* We don't need this tail shown right now, because we have callback to do.
* So we remove it here for a while, but we'll show it later.
*/
echo substr ($this->getResponse ()
->getBody ('default'), 0, - strlen ($this->tail));
/*
* Since we have just echoed the result, we don't need it in the response. Do we?
*/
Zend_Controller_Front::getInstance ()->returnResponse(true);
$response->clearBody ();
/*
* Now to business.
* We execute that calculation intensive callback.
*/
if (! empty ($this->callback) && is_callable ($this->callback))
{
call_user_func ($this->callback);
}
/*
* We sure don't want to leave behind the tail.
* Output it so html looks consistent.
*/
echo $this->tail;
}
/**
* Returns layout object
*/
function getLayout ()
{
$layout_plugin = Zend_Controller_Front::getInstance ()->getPlugin ('Zend_Layout_Controller_Plugin_Layout');
return $layout = $layout_plugin->getLayout ();
}
}
class IndexController extends Zend_Controller_Action
{
/*
* This is a calculation intensive action
*/
public function indexAction ()
{
/*
* Zend_Layout in its current implementation accumulates whole action output inside itself.
* This fact hampers out intention to gradually output the result.
* What we do here is we defer execution of our intensive calculation in form of callback into the Postpone plugin.
* The scenario is:
* 1. Application started
* 2. Layout is started
* 3. Action gets executed (except callback) and its output is collected by layout.
* 4. Layout output goes to response.
* 5. Postpone::postDispatch outputs first part of the response (without the tail).
* 6. Postpone::postDispatch calls the callback. Its output goes stright to browser.
* 7. Postpone::postDispatch prints the tail.
*/
$this->getFrontController ()
->registerPlugin (new Postpone (function ()
{
/*
* A calculation immigration
* Put your actual calculations here.
*/
echo str_repeat(" ", 5000);
foreach (range (1, 500) as $x)
{
echo "<p>$x</p><br />\n";
usleep(61500);
flush();
}
}), 1000);
}
}
I have a JavaScript functions which calls a PHP function through AJAX.
The PHP function has a set_time_limit(0) for its purposes.
Is there any way to stop that function when I want, for example with an HTML button event?
I want to explain better the situation:
I have a php file which uses a stream_copy_to_stream($src, $dest) php function to retrieve a stream in my local network. The function has to work until I want: I can stop it at the end of the stream or when I want. So I can use a button to start and a button to stop. The problem is the new instance created by the ajax call, in fact I can't work on it because it is not the function that is recording but it is another instance. I tried MireSVK's suggest but it doesn't worked!
Depending on the function. If it is a while loop checking for certain condition every time, then you could add a condition that is modifiable from outside the script (e.g. make it check for a file, and create / delete that file as required)
It looks like a bad idea, however. Why you want to do it?
var running = true;
function doSomething(){
//do something........
}
setInterval(function(){if(running){doSomething()}},2000); ///this runs do something every 2 seconds
on button click simply set running = false;
Your code looks like:
set_time_limit(0);
while(true==true){//infinite loop
doSomething(); //your code
}
Let's upgrade it
set_time_limit(0);
session_start();
$_SESSION['do_a_loop'] = true;
function should_i_stop_loop(){
#session_start();
if( $_SESSION['do_a_loop'] == false ) {
//let's stop a loop
exit();
}
session_write_close();
}
while(true==true){
doSomething();
should_i_stop_loop(); //your new function
}
Create new file stopit.php
session_start();
$_SESSION['do_a_loop'] = false;
All you have to do now is create a request on stopit.php file (with ajax or something)
Edit code according to your needs, this is point. One of many solutions.
Sorry for my English
Sadly this isn't possible (sort of).
Each time you make an AJAX call to a PHP script the script spawns a new instance of itself. Thus anything you send to it will be sent to a new operation, not the operation you had previously started.
There are a number of workarounds.
Use readystate 3 in AJAX to create a non closing connection to the PHP script, however that isn't supported cross browser and probably won't work in IE (not sure about IE 10).
Look into socket programming in PHP, which allows you to create a script with one instance that you can connect to multiple times.
Have PHP check a third party. I.E have one script running in a loop checking a file or a database, then connect to another script to modify that file or database. The original script can be remotely controlled by what you write to the file/database.
Try another programming language (this is a silly option, but I'm a fan of node). Node.js does this sort of thing very very easily.
Apologies if this has been covered before - I did my searching but possibly may not know the correct terms to have used.
This process is handled with PHP.
Here's the situation:
I have a large array of file names. The script I have opens these files and enters their content into a database. Processing these files one at a time takes over 24 hours, and these files are updated on a daily basis.
Breaking the single large array into four smaller arrays and running concurrent processes finishes the job before the 24 hour window elapses, but sometimes one or two processes will finish hours before the others because file sizes vary on a daily basis.
Much like people who stock retail shelves (who else has worked that nightmare before?) pitch in to help out with what's left after finishing their own tasks, I'd like to have a script in place where these "agents" do the same.
Here's some basics of what I have figured out - it could be wrong, and I'm not too proud to protest if I am :-)
$files = array('file1','file2','file3','file4','file5');
//etc... on to over 4k elements
while($file = array_pop($files)){
//Something in here... I have no idea what.
}
Ideas? Something like four function calls or four loops within that overarching 'while' has crossed my mind, but I'm pretty sure it's going to wait on executing subsequent calls until the previous one(s) finish.
Any help is appreciated. I'm seriously stuck on this one!
Thanks!
A database-backed message queue seems the obvious solution but I think that's overkill in this case. I would simply put the files to be processed into a single dedicated queue directory, then use the DirectoryIterator class to scan it. Something like this:
while (true) {
look in the queue directory for a file
if you don't fine one, exit the script, all processing is done
if you find one, rename it or move it to a work directory
if the rename/move command succeeded, process the file
if the rename/move command failed, one of the other threads got it first
}
Edit:
Regarding launching the workers, you could use a simple shell script to spawn the PHP processes in the background:
NUM_WORKERS=5
for WORKER in $(seq 1 ${NUM_WORKERS})
do
echo "starting worker ${WORKER}"
php -f /path/to/my/process.php &
done
Then, create a cron entry to run this launcher, for example, at midnight:
0 0 * * * /path/to/launcher.sh
You want what's called a "message queue". Something like beanstalkd
You'll basically create a list of messages that include your individual filenames. You'll then create a set of processors to process them. Each processor will handle one file then go back to the queue to see if there are more messages/files waiting to be processed.
EDIT:
Here's an analogy to help explain message queues. Your first idea is like a human manager taking a stack of files, dividing them into four piles and then handing each of his four employees a pile to process. A message queue is more like this: the manager puts all the files on a table and tells each employee to take a single file from the table and process it. He tells them when they're done with the first file to keep taking files until there are no more files on the table. When all the files are done, the employees can go home.
One employee might end up with really large files and only handle a few, while another employee might get smaller files and handle many. It doesn't matter how many each employee handles, they'll all keep working until the table is empty.
I would have a socket server master script that hands out file paths to x number of slave scripts, until there are no files left to process. This way, all the slave scripts will keep running, and you can hand out file paths dynamically as they are requested.
Something like this:
master.php
<?php
// load the array of files to process (however you do this)
$fileList = file('filelist.txt');
// Create a listening socket on localhost
$serverSocket = stream_socket_server('tcp://127.0.0.1:7878');
$sockets = array($serverSocket);
$clients = array();
// Loop while there are still files to process
while (count($fileList)) {
// Run a select() call on the existing sockets' read buffers
// Skip to next iteration if no sockets are waiting for handling
if (stream_select($read = $sockets, $write = NULL, $except = NULL, 1) < 1) {
continue;
}
// Loop sockets with data to read
foreach ($read as $socket) {
if ($socket == $serverSocket) {
// Accept new clients
$sockets[] = $clients[] = stream_socket_accept($serverSocket);
} else if (trim(fgets($socket)) == 'next') {
// Hand out a new file path to the client
fwrite($socket, array_shift($fileList)."\n");
if (!count($fileList)) {
break 2;
}
}
}
}
// When we're done, disconnect the clients
foreach ($clients as $socket) {
#fclose($socket);
}
// ...and close the listen socket
#fclose($serverSocket);
slave.php
<?php
$socket = fsockopen('127.0.0.1', 7878);
while (!feof($socket)) {
// Get a new file path from the master
fwrite($socket,"next\n");
$path = trim(fgets($socket));
if (is_file($path)) {
// Process the file at $path here
}
}
You then just need to start master.php, then when it is running, you can start however many instances of slave.php as you want, and they will all keep running until there are no more files to process.
Obviously, this has no error handling, but it should provide a basic framework to get you started. This relies on blocking function calls (stream_select() and fgets()) to avoid a race condition - this may or may not be sufficient for your purposes.
Ok here is my problem.
I have a file which outputs an XML based on an input X
I have another file which calls the above(1) file with 10000 (i mean many) times with different numbers for X
When an user clicks "Go" It should go through all those 10000 Xs and simultaneously show him a progress of how many are done. (hmm may be updated once every 10sec).
How do i do it? I need ideas. I know how to AJAX and stuff, but whats the structure my program should take?
EDIT
So according to the answer given below i did store my output in a session variable. It then outputs the answer. What is happening is:
When i execute a loong script. It gets executed say within 1min. But in the mean time if i open (in a new window) just the file which outputs my SESSION variable, then it doesnt output will the first script has run. Which is completely opposite to what i want. Whats the problem here? Is it my syste/server which doesnt handle multiple requests or what?
EDIT 2
I use the files approach:
To read what i want
> <?php include_once '../includeTop.php'; echo
> util::readFromLog("../../Files/progressData.tmp"); ?>
and in another script
$processed ++;
util::writeToLog($dir.'/progressData.tmp', "Files processed: $processed");
where the functions are:
public static function writeToLog($file,$data) {
$f = fopen($file,"w");
fwrite($f, $data);
fclose($f);
}
public static function readFromLog($file) {
return file_get_contents($file);
}
But still the same problem persist :(. I can manually see the file gettin updated like 1, 2, 3 etc. But when i run my script to do from php it just waits till my original script is output.
EDIT 3
Ok i finally found the solution. Instead of seeking the output from the php file i directly goto the log now and seek it.
Put the progress (i.e. how far are you into the 2nd file) into a memcached directly from the background job, then deliver that value if requested by the javascript application (triggered by a timer, as long as you did not reach a 100%). The only thing you need to figure out is how to pass some sort of "transaction ID" to both the background job and the javascript side, so they access the same key in memcached.
Edit: I was wrong about $_SESSION. It doesn't update asynchronously, i.e. the values you store in it are not accessible until the script has finished. Whoops.
So the progress needs to be stored in something that does update asynchronously: Memory (like pyroscope suggests, and which is still the best solution), a file, or the database.
In other words, instead of using $_SESSION to store the value, it should be stored by memcached, in a file or in the database.
I.e. using the database
$progress = 0;
mysql_query("INSERT INTO `progress` (`id`, `progress`) VALUES ($uid, $progress)");
# loop starts
# processing...
$progress += $some_increment;
mysql_query("UPDATE `progress` SET `progress`=$progress WHERE `id`=$uid");
# loop ends
Or using a file
$progress = 0;
file_put_contents("/path/to/progress_files/$uid", $progress);
# loop starts
# processing...
$progress += $some_increment;
file_put_contents("/path/to/progress_files/$uid", $progress);
# loop ends
And then read the file/select from the database, when requesting progress via ajax. But it's not a pretty solution compared to memcached.
Also, remember to remove the file/database row once it's all done.
You could put the progress in a $_SESSION variable (you'll need a unique name for it), and update it while the process runs. Meanwhile your ajax request simply gets that variable at a specific interval
function heavy_process($input, $uid) {
$_SESSION[$uid] = 0;
# loop begins
# processing...
$_SESSION[$uid] += $some_increment;
# loop ends
}
Then have a url that simply spits out the $_SESSION[$uid] value when it's requested via ajax. Then use the returned value to update the progress bar. Use something like sha1(microtime()) to create the $uid
Edit: pyroscope's solution is technically better, but if you don't have a server with memcached or the ability to run background processes, you can use $_SESSION instead