(Our server is Linux based)
I'm an experienced PHP developer but first time i'll develop a bot which always running and fetch some datas.
I'll explain my application with a simple (and sample) scenario. I have about 2000 web site url and my application will visit this url's and record contents of web page's . This application will work 7 days 24 hours. It will start working again when it's finish 2000 web sites.
But i need some suggestions for my server. As you see, my application will be run infinity until i shut down server. I can do this infinity loop with this :
while(true)
{
APPLICATION CODES HERE
}
But i think this will be an evil for server :) Is it possible to doing something like this, on server side?
Also i think using cronjobs but it's not work for my scenario. Because my script start working again asap it's finish working. I have to "start again when you finish your work" , not "start every 30 minutes" . Because i don't know, maybe fetching all 2000 websites, will take more than 30 minutes or less than 30 minutes.
I hope i explained it very well.
Also i'm worried about memory usage. As you know garbage collector cleans memory after every PHP script stop. But as i said, my app won't stop for days (maybe weeks) . So garbage collector won't be triggered. I'm manually unsetting (unset() function) all used variables at end of script. Is it enough?
I need some suggestions from server administrators :)
PS. I'm developing it as console application, not a web application. I can execute it from command line.
Batch processing.. store all the sites in a csv or something, mark them after completion, then work on all the ones non-marked, then work on all the marked.. etc. Only do say 1 or 5 at a time, initiate batch script every minute from cron..
Don't even try to work on all of them at once.. any errors and you won't know what happened..
Could even store the jobs in a database, store processing stats etc.. allows for fine-tuning and better reporting.
You will probably hit time-limits trying to run infinite php scripts, even from the command line.. also your server admin will hate you. Will probably run into Memory limits if you don't release resources properly.. far too easily done with php.
Read: http://www.ibm.com/developerworks/opensource/library/os-php-batch/
Your script could just run through the list once and quit. That way, what ever resources php is holding can be freed.
Then have a shell script that calls the php script in an infinite loop.
As php is not designed for long running task, I am not sure if the garbage collection is up to the task. Quiting after every run will force it to release everything.
Related
I've previously used Gearman along with supervisor to manage jobs.
In this case we are using Amazon SQS which I have spent some time trying to get my head around.
I have set up a separate micro instance from our main webserver to use as an Image processing server (purely for testing at the moment, it will be upgraded and become part of a cluster before this implementation goes live)
On this micro instance I have installed PHP and ImageMagick in order to perform the image processing.
I have also written a worker script which receives the messages from Amazon SQS.
All works perfectly, however I need this script to run over and over again in order to continuously check for messages.
I don't like the thought of running a continuous loop so have started to look at other methods with little success.
So my question is what is generally considered the best practice way to do this?
I am worried about memory since PHP wasn't really designed for this, therefore it feels like running the script for a while, then stopping and restarting it might be my best bet.
I have experience using supervisor (to ensure that gearman workers kept running) and am wondering if I could simply use that to continuously execute the simple php script over and over?
My thoughts are as follows:
Set up SQS long polling so that the script checks for 20 seconds.
Use a while loop with a 20 second sleep to keep this script running for say an hour at a time
Have all this run through supervisor. When the hour is up and the loop is complete, allow the script to exit.
Supervisor should then automatically restart it
Does this sound viable? Is there a better way? What is generally considered the best practice for receiving SQS messages in PHP?
Thanks in advance
In supervisord you can set autorestart to true to have it run your command over and over again. See: http://supervisord.org/configuration.html#program-x-section-settings
Overall, using an endless while loop is perfectly fine, PHP will free your objects correctly and keep memory in check if written correctly. It can run for years without leaks (if there's a leak, you probably created it yourself, so review your code).
How do I stop a Supervisord process without killing the program it's controlling? might be of interest to you; the OP had a similar setup, with autorestart and wanted to add graceful shutdowns to it.
Hoping you can help! I am currently building and testing a PHP script that ports data from one web system to another (think data backup) that needs to run daily for an indefinite number of users. The script is fairly intensive, depending on the amount of data that needs to be pulled (the longest execution time I have seen thus far has been about 30 minutes).
Given that, I obviously don't want to run them one after the other, as the whole job won't complete in a timely fashion. So ideally, I would like to have some way to schedule the job so that it can run up to ten (which I can expand as server capacity increases) backups simultaneously. When one script completes, it picks up the next at the top of the pile (a single pile rather than 10) an executes it, and so on. Now, it is possible (and at this stage probable) that some of the instances are going to fail with a fatal error and die. That is fine, as I am handling that with a custom error handler, but obviously I don't want the failure of one instance to have any bearing on the others.
Having read some of the other questions on here, I have seen PHP forking and Supervisord discussed, but to be honest, casting my mind back 7 years to my process scheduling paper has defeated me! It would be really great to get some advise of how to implement something like this, if it is at all possible? Thanks :)
I'd recommend using proc_open to execute multiple commands asynchronously. If the backup process is itself a PHP script, it can be run using the php binary (e.g. php mybackupscript.php)
I have created a php/mysql scraper, which is running fine, and have no idea how to most-efficiently run it as a cron job.
There are 300 sites, each with between 20 - 200 pages being scraped. It takes between 4 - 7 hours to scrape all the sites (depending on network latency and other factors). The scraper needs to do a complete run once daily.
Should I run this as 1 cron job which runs for the entire 4 - 7 hours, or run it every hour 7 times, or run it every 10 minutes until complete?
The script is set up to run from the cron like this:
while($starttime+600 > time()){
do_scrape();
}
Which will run the do_scrape() function, which scrapes 10 urls at a time, until (in this case) 600 seconds has passed. The do_scrape can take between 5 - 60 seconds to run.
I am asking here as I cant find any information on the web about how to run this, and am kind of wary about getting this running daily, as php isnt really designed to be run as a single script for 7 hours.
I wrote it in vanilla PHP/mysql, and it is running on cut down debian VPS with only lighttpd/mysql/php5 installed. I have run it with a timeout of 6000 seconds (100 minutes) without any issue (the server didnt fall over).
Any advice on how to go about this task is appreciated. What should I be watching out for etc..? or am i going about executing this all wrong?
Thanks!
There's nothing wrong with running a well-written PHP script for long periods. I have some scripts that have literally been running continuously for months. Just watch your memory usage, and you should be fine.
That said, your architecture is pretty basic, and is unlikely scale very well.
You might consider moving from a big monolithic script to a divide-and-conquer strategy. For instance, it sounds like your script is making synchronous requests for every URL is scrapes. If that's true, then most of that 7 hour run time is spent idly waiting for a response from some remote server.
In an ideal world, you wouldn't write this kind of thing PHP. Some language that handles threads and can easily do asynchronous http requests with callback would be much better suited.
That said, if I were doing this in PHP, I'd be aiming at having a script that kicks of N children who grab data from URLs, and stick the response data in some kind of work queue, and then another script that pretty much runs all the time, processing any work it finds in the queue.
Then you just cron your fetcher-script-manager to run once an hour, it manages some worker processes that fetch the data (in parellel, so latency doesn't kill you), and stick the work on the queue. Then the queue-cruncher sees the work on the queue and crunches it.
Depending on how you implement the queue, this could scale pretty well. You could have multiple boxes fetching remote data, and sticking it on some central queue box (with a queue implemented in mysql, or memcache, or whatever). You could even conceivably have multiple boxes taking work from the queue and doing the work.
Of course, the devil is in the details, but this design is generally more scalable and usually more robust than a single-threaded fetch-process-repeat script.
You shouldn't have a problem running it once a day to completion. That's the way I would do it. Timeouts are a big issue if php is being served through a web server, but since you are interpreting directly through the php executable this is ok. I would advise you to use python or something else that is more task-friendly, though.
I am developing a website that requires a lot background processes for the site to run. For example, a queue, a video encoder and a few other types of background processes. Currently I have these running as a PHP cli script that contains:
while (true) {
// some code
sleep($someAmountOfSeconds);
}
Ok these work fine and everything but I was thinking of setting these up as a deamon which will give them an actual process id that I can monitor, also I can run them int he background and not have a terminal open all the time.
I would like to know if there is a better way of handling these? I was also thinking about cron jobs but some of these processes need to loop every few seconds.
Any suggestions?
Creating a daemon which you can make calls to and ask questions would seem the sensible option. Depends on wether your hoster permits such things, especially if you're requiring it to do work every few seconds, then definately an OS based service/daemon would seem far more sensible than anything else.
You could create a daemon in PHP, but in my experience this is a lot of hard work and the result is unreliable due to PHP's memory management and error handling.
I had the same problem, I wanted to write my logic in PHP but have it daemonised by a stable program that could restart the PHP script if it failed and so I wrote The Fat Controller.
It's written in C, runs as a daemon and can run PHP scripts, or indeed anything. If the PHP script ends for whatever reason, The Fat Controller will restart it. This means you don't have to take care of daemonising or error recovery - it's all handled for you.
The Fat Controller can also do lots of other things such as parallel processing which is ideal for queue processing, you can read about some potential use cases here:
http://fat-controller.sourceforge.net/use-cases.html
I've done this for 5 years using PHP to run background tasks and its no different to doing in any other language. Just use CRON and lock files. The lock file will prevent multiple instances of your script running.
Also its important to monitor your code and one check I always do to prevent stale lock files from preventing scripts to run is to have second CRON job to check if if the lock file is older than a few minutes and if an instance of the PHP script is running, if not it then removes the lock file.
Using this technique allows you to set your CRON to run the script every minute without issues.
Use the System::Daemon module from PEAR.
One solution (that I really need to try myself, as I may need it) is to use cron, but get the process to loop for five mins or so. Then, get cron to kick it off every five minutes. As one dies, the next one should be finishing (or close to finishing).
Bear in mind that the two may overlap a bit, and so you need to ensure that this doesn't cause a clash (e.g. writing to the same video file). Some simple inter-process communication may be useful, even if it is just writing to a PID file in the temp directory.
This approach is a bit low-tech but helps avoid PHP hanging onto memory over the longer term - sort of in-built task restarts!
I am looking for the PHP equivalent for VB doevents.
I have written a realtime analysis package in VB and used doevents to release to the operating system.
Doevents allows me to stay in memory and run continuously without filling up memory and allows me to respond to user input.
I have rewritten the package in PHP and I am looking for that same doevents feature.
If it doesn't exist I could reschedule myself and exit.
But I currently don't know how to do that and I think that would add a lot more overhead.
Thank you, gerardg
usleep is what you are looking for.. Delays program execution for the given number of micro seconds
http://php.net/manual/en/function.usleep.php
It's been almost 10 years since I last wrote anything in VB and as I recall, doevents() function allowed the application to yield to the processor during intensive processing (usually to allow other system events to fire - the most common being WM_PAINT so that your UI won't appear hung).
I don't think PHP has such functionality - your script will run as a single process and end (either when it's done or when it hits the default 30 second timeout).
If you are thinking in terms of threads (as most Windows programmers tend to do) and needing to spawn more than 1 instance of your script, perhaps you should take look at PHP's Process Control functions as a start.
I'm not entirely sure which aspects of doevents you're looking to emulate, so here's pretty much everything that could be useful for you.
You can use ob_implicit_flush(true) at the top of your script to enable implicit output buffer flushing. That means that whenever your script calls echo or print or whatever you use to display stuff, PHP will automatically send it all to the user's browser. You could also just use ob_flush() after each call to display something, which acts more like Application.DoEvents() in VB with regards to keeping your UI active, but must be called each time something is output.
Naturally if your script uses the output buffer already, you could build a copy of the buffer before flushing, with ob_get_contents().
If you need to allow the script to run for more time than usual, you can set a longer tiemout with set_time_limit($time). If you need more memory, and you have access to edit your .htaccess file, place the following code and edit the value:
php_value memory_limit 64M
That sets the memory limit to 64 megabytes.
For running multiple scripts at once, you can use pcntl_exec to start another one running.
If I am missing something important about DoEvents(), let me know and I will try to help you make it work.
PHP is designed for asynchronous on demand processing. However it can be forced to become a background task with a little hackery.
As PHP is running as a single thread you do not have to worry about letting the CPU do other things as that is already taken care of. If this was not the case then a web server would only be able to serve up one page at a time and all other requests would have to sit in a queue. You will need to write some sort of look that never expires until some detectable condition happens (like the "now please exit" message you set in the DB or something).
As pointed out by others you will need to set_time_limit($something); with perhaps usleep stopping the code from running "too fast" if it eats very much CPU each loop. However if you are also using a Database connection most of your script time is actually the script waiting for the Database (by far the biggest overhead for a script).
I have seen PHP worker threads created by using screen and detatching it to a background task. Other approaches also work so long as you do not have a session that will time out or exit (say when the web browser is closed). A cron that starts a script to check if the script is running every x mins or hours gives you automatic recovery from forced exists and/or system restarts.
TL;DR: doevents is "baked in" to PHP and you don't have to worry about it.