Ok, let's say my website allows users to upload large batches of files for processing.
Now, once they upload the files say a process_files() function is called. This will take a long time to complete and it will basically make the website unusable for the user until the task either times out or completes.
What I'm wondering is if there's an easy to to just execute this in parallel or in the background so the user can still use the website?
If it makes a different in my case these will very seldom be called.
you can divide task to two parts:
first part (web) only saves a file on a disk. this way web part will work fast.
second (system) will process all files saved by first part. and you can start it periodically ie as a cron task on linux, or use some other task scheduler depending on your system.
i would suggest that after the file upload, you call a script via the CLI, which will run in the background and be non blocking:
exec("php the_long_file_processing_script.php > /dev/null &");
Related
I need to write a cron (php) script to get the html result from several websites.
Let's say my database has 50 websites records (ie. http://www.somewebsite.com/page.php). So the cron job will be set to run every x mins. When it's running, it will load the records from database, check the status of each websites then get the HTML result back from it then analysis it.
My concern is, if the website from n'th record is not responding, or it takes some time to load (ie. oversea website), then n+1's record won't be run, not until n'th record is finished, then this cron job will take a while to finish.
If I execute the script on a browser, then it can be easily handled by using ajax async, however it's a cron job, so I have no idea how to handle this situation.
Here is what you can do,
Run the sh script from crontab and in the script invoke .php program that takes care of async tasks.
I think its better to move to other language if there is need of 'Async'. It's a major upgrade to how you would create the overall system architecture. So this is a major upgrade to any PHP developer as myself spending a lot in PHP and looking into NodeJS for better solutions. And PHP does not support such need internally or from the core yet, although the term 'Aync' is introduced.
I have a site where auctions end a varying times. I need to send an automated email to the seller and the buyer after the auction is finished to notify them of the auction ending and the results. Obviously I can't really wait for someone to load the page to run the script so is there a good way to automate this by checking the current time and comparing that to the time of the auction end and running that script?
The site is on a UNIX server so a cron job is an option, but I'm concerned that running a cron job like that will put quite a load on the server.
A cron job runs at most once per minute.
Whatever load it generates on the server really depends on the kind of script you're going to run. Btw, I'm assuming that you're using cli to run the script (rather than just doing a curl http://mysite.com.
If your script takes longer than one minute (you should monitor this), simply either:
Increase the interval time between runs or,
Use a lock file to make sure no two instances of your script can run at the same time.
if (($fp = fopen('/tmp/mylockfile', "r+")) === false) {
die("Could not open lock file");
}
if (!flock($fp, LOCK_EX | LOCK_NB)) {
die("Could not obtain lock");
}
// run your code here
// release the lock and close file
fclose($fp);
OTOH If the script needs to run more than once per minute, you would need a different mechanism entirely.
Q: What is the best way to run a PHP script at a particular time, or interval?
A: Use cron
Q: Does a cronjob create a big load on the server?
A: Depends off course off your script. But checking if an auction should be closed, close it and send two emails shouldn't be to difficult. Be sure to create some kind of lockfile to make sure that if your script runs longer than the interval set, it isn't run twice.
Q: running a script with shorter intervals than 1 minute
A: Can't answer this one for you. Sorry :)
Use Cron. It allows you to run any command at most once per minute: http://clickmojo.com/code/cron-tutorial.html
As far as server load goes, it generally won't be a concern unless you are running a massive number of database calls very often on a very low-end server. I speak in generalities, but the idea is sound.
If you are using something else (besides PHP) to run your auction timer mechanism, I recommend you attach some code to that timer mechanism that also executes a mail-sending script when the timer runs down to zero and determines a winner.
Run the PHP script as a command line script. This will not put a load on the webserver - just a load on the server and you can easily run it via CRON.
If you add #!/usr/bin/php to the top of the script and change the execute bit on the file with chmod +x scriptname.php you can directly execute the script without passing it through php
http://php.net/manual/en/features.commandline.php
A couple of things you need to do this:
Store something in your auction information indicating whether you've sent this e-mail yet or not (could be a boolean or a date for when it was sent which might be null). Although I have to assume you need to do something besides send this e-mail? Like mark the auction as closed so no more bidding can take place?
A bit of code that finds auctions which need this e-mail sent: e.g. they've ended and have not yet been reminded.
Something to repeatedly execute the bit of code in 2. You could use cron. Alternatively you can write a pretty simple daemon for unix that runs constantly in a loop of (wait at least a few ms or more; do some stuff). The latter is a lot more work but in my opinion scales much better. See http://pear.php.net/package/System_Daemon for some useful tools if you're interested in this approach.
One thing to consider is how much you want to be careful about accidentally double-sending this e-mail. If you're only running this code in a single thread it's pretty easy but if you ever want to build out to the point where you have several different distributed machines that create and send these e-mails you have to be a bit more careful. If you're running it out of cron can you guarantee one run of it will always be finished before another one starts?
I've built a PHP script that creates/restores a backup containing a sites content and database. It works really well on smaller sites but it runs into trouble on larger sites. What would be the best way to batch a script like this? Basically, it copies files from one directory to another, creates a DB dump and then zips the directory.
I've done a little bit of research, do I need to use cron jobs?
If it is something that happens at a fixed time / schedule, then it should be a cron job. This is fairly straightforward to set up. There are plenty of tutorials.
If, on the other hand, it is an action a user triggers from a web browser, you should fork and exec. You take in the user's input, fork and exec and then let the user know that he will be emailed when the process is complete.
I am developing a video upload site and I have ran into a dilemma: videos uploaded need to be converted into the FLV format in order to be displayed to a visitor but, if I execute the command within the script, the script will hang for about 10-15 minutes while the FFMPEG converts the video.
I had an idea to insert a record in to the database indicating the file needs to be processed, then using a cron job set to every 5 minutes to select records from the database which needs to be processed, process them, then update the database showing they have been processed. My worry about this is executing too many processes and the server crashing under the strain, so has anyone got any solutions to this or a way to better the process I have in mind?
Okay, this is now what I have in mind, so the user uploads a video and a row is inserted in to the database indicating the video needs to be processed. A cron job set to every 5 minutes checks what needs to be processed and what is being processed, say I would make a maximum of five processes at one time, so the script would check if any video needs to be processed and how many videos are being processed, if it is less then five, it updates the record indicating that it is being processed, once the video has been processed, it updates the record indicating it has been processed and the cron job starts again, any thoughts?
Gearman is a good solution for this kind of problem, it lets you instantly dispatch a job and have any number of workers (which may be on different servers) available to fulfill it.
To start with you can run a few workers on the same server, but if you start to run into load issues then you can just fire up another server with some more workers, so it's horizontally scalable.
If you're using PHP-FPM then you can make use of fastcgi_finish_request() as documented on PHP.net. FastCGI Process Manager (FPM)
fastcgi_finish_request() - special function to finish request and flush all data while continuing to do something time-consuming (video converting, stats processing etc.);
If you're not using PHP-FPM or want something more advanced then you might consider using a queue manager like Gearman which is perfectly suited to the scenario you're describing. The advantage of using Gearman over running a process with shell_exec is you can take a look at how many jobs are running / how many are left and check their statuses. You also make scaling much easier as it's now trivial to add job servers:
$worker->addServer("10.0.0.1");
I love this class (see the specific comment) in the PHP manual: http://www.php.net/manual/en/function.exec.php#88704
Basically, it lets you spin off a background process on *Nix systems. it returns a pid, which you can store in the session. When you reload the page to check on it, you simply recreate the ForkedProcess class with the saved pid, and you can check on it's status. If it's complete, the process should be done.
It doesn't allow for much error checking, but it's incredibly lightweight.
If you expect a lot of traffic you should seriously consider a dedicated server.
On a single server, you can use shell_exec along with the UNIX nohup command to get the PID of the process.
function run_in_background($Command, $Priority = 0)
{
if($Priority)
$PID = shell_exec("nohup nice -n $Priority $Command 2> /dev/null & echo $!");
else
$PID = shell_exec("nohup $Command 2> /dev/null & echo $!");
return($PID);
}
function is_process_running($PID)
{
exec("ps $PID", $ProcessState);
return(count($ProcessState) >= 2);
}
A full description of this technique is here: http://nsaunders.wordpress.com/2007/01/12/running-a-background-process-in-php/
You could perhaps put the list of PIDs in a MySQL table and then use your cron job every 5 mins to detect when a video is complete and update the relevant values in the database.
You can call ffmpeg use system and send the output to /dev/null, this will make that call return right away, effectively handling it in the background.
Spawn couple of worker processes which will consume messages from message queue like for example beanstalkd. This way you can control the number of concurrent tasks(conversions) and also don't have to pay price of spawning processes(because processes keep running in background).
I think it would be even a lot faster if you used/coded C and used Redis as your message queue. Redis has a very good c client library named Hiredis. I don't think this would be insanely difficult to accomplish.
Issue summary: I've managed to speed up the thumbing of images upon upload dramatically from what it was, at the cost of using concurrency. Now I need to secure that concurrency against a race condition. I was going to have the dependent script poll normal files for the status of the independent one, but then decided named pipes would be better. Pipes to avoid polling and named because I can't get a PID from the script that opens them (that's the one I need to use the pipes to talk with).
So when an image is uploaded, the client sends a POST via AJAX to a script which 1) saves the image 2) spawns a parallel script (the independent) to thumb the image and 3) returns JSON about the image to the client. The client then immediately requests the thumbed version, which we hopefully had enough time to prepare while the response was being sent. But if it's not ready, Apache mod_rewrites the path to point at a second script (the dependent), which waits for the thumbing to complete and then returns the image data.
I expected this to be fairly straightforward, but, while testing the independent script alone via terminal, I get this:
$ php -f thumb.php -- img=3g1pad.jpg
successSegmentation fault
The source is here: http://codepad.org/JP9wkuba I suspect that I get a segfault because that fifo I made is still open and now orphaned. But I need it there for the dependent script to see, right? And isn't it supposed to be non-blocking? I suppose it is because the rest of the script can run.... but it can't finish? This would be a job for a normal file as I had thought at the start, except if both are open I don't want to be polling. I want to poll once at most and be done with it. Do I just need to poll and ignore the ugliness?
You need to delete created FIFO files then finish all scripts.