How to create a process on the server like ftp_get() but not waiting its results to continue the PHP script?
My issue is I'm working on a synchronization script and some files are really huge to be downloaded using PHP since it conflicts with max execution time.
Is there any way to initiate the process to download the file and leave it to proccess another?
You need threading in PHP.
See http://php.net/manual/en/class.thread.php, if you don't have experience with threading then you should look up some tutorials and examples and then some. After thinking you understand them, research them some more.
And maybe a bit more...
Creating a multi-threaded application that is stable is a hard task.
Otherwise you could always increase the max execution time, or setup the cron job to download the FTP files in advance such as 30 minutes before with other linux utilities.
Related
I would like to real-hand experienced answer on this.
Which one is faster? Writing a Shell Script or PHP script? This script be will setup in cron.
Here is the brief idea of what I am trying to accomplish.
We get a lot of PGP encrypted files from clients. We download them to our local server, decrypt them and move them to different location for further processing.
There could be around 20-25 files a day to do and the number goes up gradually.
We have written both PHP script and Shell script to do this, for testing purposes.
But we are not sure which is going to faster and advantageous.
Has anyone tried? Any inputs?
Thanks much!
As indicated in the comments, you ought to just benchmark.
The overhead associated with the script will certainly be insignificant compared to the time spent in the decryption phase. (Cryptography is a notoriously computationally expensive process, especially dual-key crypto.)
Also, 20-25 requests, even 1000 requests, is nothing on a modern machine, unless we are talking about decrypting gigantic files (in which case, again, the crypto step will swamp any optimizations in the wrapper script). Asking this question and benchmarking are probably more time consuming that any overhead you will encounter.
(As an aside, I really hope that you are doing the decryption on a back-end machine not directly facing the public. Guard your key!)
Both use an interpreter to execute your tasks. Depending on which OS you are using, their engines could have been both written in C++.
I would use PHP. Because it has more modules you can addon.
Say do your PGP encryption then you want to update a mySQL DB, Send an email, post to facebook, send a tweet out that your task is complete.
Edit - PHP doesn't require awebserv. I'm referring to the command line execution of php and shell script.
PHP Command line help - http://php.net/manual/en/features.commandline.php
I built an app in php where a feature analyzes about 10000 text files and extracts stuff from them and puts it into a mysql database. The code itself is just a for loop where every file is loaded through file_get_contents() and after the end of that iteration, its unset() from memory. The file analysis is a cron job and a single php file does all this processing.
The problem however is that the app was built (initially) entirely on a shared server everything worked seamlessly really well. I didn't notice any delays or major lags neither did users however in order for it to be able to handle more of a load, I moved everything to an EC2 server (the micro instance).
The problem I am having now is that every time I run the cronjob (process the files on hourly basis) it slows the entire server down so much that a normal page takes about 5-8 seconds to load, which sort of defeats the purpose of moving it to EC2.
The cron itself is a very long process. Here are some tests results of the script process (every hr)
SQL Insertion Time: 23.138303995132 seconds
Memory Used: 10.05 MB
Execution: 411.00507092476 seconds
But on the top of every hour the server slows down so much for 7 minutes despite of having more dedicated hardware acceleration compared to a shared server (I think at least). The graphs from EC2 dashboard show that the CPU usage is close to 100% but I don't understand how it gets to that level.
Can anyone help me determine the reason as to why this could be happening? I have noticed not even the slightest lag when the cron runs on the shared server but the case is completely different for EC2.
Please feel free to ask me anything I missed mentioning.
Micro instances are pretty slow. If you use a larger instance, it'll run a lot faster.
We use EC2 for all of our production boxes. I can't say enough good things about that platform. I'll never go back to another host.
Also, if you want to write your code in C++, it'll run A LOT faster. I wrote a simple mysql insert with this code here. It's multi-threaded, so you can asyncronously run mysql updates or inserts.
Please let me know if you need any help with it, but I'm sure you'll be able to just use a micro instance still and get great speeds.
Hope that helps...
PS. I'd be willing to help you write a C++ version for your uses... just because it's fun! :-)
Well EC2 is designed to be scalable.
Since your code is running in 1 loop to open each file one after another, it does not make for a scalable design.
Try changing your codes to break them up so that the files are handled concurrently by different instances of the php script. That way, each copy of the script can run in a thread by itself. If you have multiple servers (or instances of servers in EC2), you can run them on different machines to speed it up even more.
I am developing a website that requires a lot background processes for the site to run. For example, a queue, a video encoder and a few other types of background processes. Currently I have these running as a PHP cli script that contains:
while (true) {
// some code
sleep($someAmountOfSeconds);
}
Ok these work fine and everything but I was thinking of setting these up as a deamon which will give them an actual process id that I can monitor, also I can run them int he background and not have a terminal open all the time.
I would like to know if there is a better way of handling these? I was also thinking about cron jobs but some of these processes need to loop every few seconds.
Any suggestions?
Creating a daemon which you can make calls to and ask questions would seem the sensible option. Depends on wether your hoster permits such things, especially if you're requiring it to do work every few seconds, then definately an OS based service/daemon would seem far more sensible than anything else.
You could create a daemon in PHP, but in my experience this is a lot of hard work and the result is unreliable due to PHP's memory management and error handling.
I had the same problem, I wanted to write my logic in PHP but have it daemonised by a stable program that could restart the PHP script if it failed and so I wrote The Fat Controller.
It's written in C, runs as a daemon and can run PHP scripts, or indeed anything. If the PHP script ends for whatever reason, The Fat Controller will restart it. This means you don't have to take care of daemonising or error recovery - it's all handled for you.
The Fat Controller can also do lots of other things such as parallel processing which is ideal for queue processing, you can read about some potential use cases here:
http://fat-controller.sourceforge.net/use-cases.html
I've done this for 5 years using PHP to run background tasks and its no different to doing in any other language. Just use CRON and lock files. The lock file will prevent multiple instances of your script running.
Also its important to monitor your code and one check I always do to prevent stale lock files from preventing scripts to run is to have second CRON job to check if if the lock file is older than a few minutes and if an instance of the PHP script is running, if not it then removes the lock file.
Using this technique allows you to set your CRON to run the script every minute without issues.
Use the System::Daemon module from PEAR.
One solution (that I really need to try myself, as I may need it) is to use cron, but get the process to loop for five mins or so. Then, get cron to kick it off every five minutes. As one dies, the next one should be finishing (or close to finishing).
Bear in mind that the two may overlap a bit, and so you need to ensure that this doesn't cause a clash (e.g. writing to the same video file). Some simple inter-process communication may be useful, even if it is just writing to a PID file in the temp directory.
This approach is a bit low-tech but helps avoid PHP hanging onto memory over the longer term - sort of in-built task restarts!
I'm making a transcoding server which uses FFMPEG to convert videos to flv. After user uploads a video it's queued for processing in amazon Simple Queue Service. System is linux ubuntu.
Instead of running CRON each 1min I wonder if it would be possible to continously run several PHP scripts (dowload queued files, process downloaded etc). Each of them would have its own queue which would be read every 10s or so looking for new tasks.
My question is:
How to detect if the script is already running? I'd run CRON each 1min and if one of the programs would not be running I'd load it again. How stuff like that is done on linux? PID files?
thanks for help,
ian
Instead of doing this with only pure-PHP, I would probably go with a solution based on gearman (quoting wikipedia) :
Gearman is an open source application
framework [...]. Gearman is
designed to distribute appropriate
computer tasks to multiple computers,
so large tasks can be done more
quickly.
It works well with PHP, thanks to the gearman extension, and will deal with most of the queuing stuff for you.
Note that it'll also facilitate things when you have more videos to transcode, making scaling to several servers easier.
Yes, you can use PID files
Or temporary table, or memcache, e.t.c.
But I do like this:
By cron runs script that execute convert video, and it check if process is terminated or not
This cron script get movie what needs to convert from database or file
PHP's PEAR repository has a System_Daemon class for creating daemons out of PHP. I've used it for a couple systems with good results.
I've created a similar script specifically for this issue
Check: https://github.com/SirNarsh/EasyCron
The idea is to save PID of the script to a file, then check if the process is running by checking /proc/PID existence
I recently installed my video script to a new server but I am seeing that it will start to convert 1 video (via mencoder) then before finishing it, it will try and convery another, and another, so it will be trying to convert 4+ videos at the same time causing the server to shut down. The script developer said:
"It converts each video in a PHP background process. There might be a way to limit the number of PHP background processes on your server and queue them."
So how is this done please?
Regards
Use PHP Semaphores
You can use a shared counting Semaphore in PHP, and implement a queue with a cap on the no. of parallel executions. Semaphores are always the most recommended method for any form of concurrency control.
Using this you can easily configure and control the parallel executions of mencoder, and limit them as well.
Pseudocode
Begin
init sem=MAX;
wait(sem) //sem--, waits if sem=0, till atleast one process comes out of the critical section
/*
Critical Section
where you execute mencoder
*/
signal(sem) //sem++
End
Use some sort of lock. For example, use file locking on a directory so only one process at a time can use the resource.
This would require a wrapper script for the encoder which will wait for the lock to be released by the currently running encoder.
It should also be smart enough to detect when a lock wasn't freed if the encoder crashes, and release the lock.
Edit:
My understanding of the problem was there were multiple invocations of the script, each calling mencoder. However, from the other response, it seems that it might be one invocation running all the processes in the background. Then I think the solution using Semaphores is better.
Edit:
Looks like someone asked this question before:
best-way-to-obtain-a-lock-in-php