I have a few questions about PHP memory usage. I'm going to run some tests on my own, but getting various advice is quite helpful.
I recently learned about the PHP function ignore_user_abort(), which allows a script to continue running even if a user closes the page. I was thinking about using this for my E-mail Newsletter tool instead of Cron jobs, as configuring Cron jobs has various pitfuls. The alternative approach of making a user stay on the page, using AJAX requests, and running part of the script after the page content has been delivered all have issues as well.
My solution would be to run call ignore_user_abort(true) at the beginning of the script, and at the end after the content has been generated, call flush() for good measure, and then run the newsletter script. Alternatively, do this with an AJAX.
First of all, does anyone see issues with that approach?
Second of all, if I used the script with no time limit set, and a while loop going through each email, what would the memory usage be like if I did it in one go? Since I'd be overwriting variables, not using new ones, I'd think it would be low.
Third, because if I am sending a large volume of emails, say 1000 per run, I don't want to overload my mail server. With my cron job, I run the script every 5 minutes, sending a batch of 50 emails out. If I was doing this in a single pass, could I send out 50 emails, call sleep for say 5 minutes, and then continue for another 50 emails? If so, what is the script memory usage like during the sleep period? Would this be an efficient method?
What I'm really trying to do here is come up with a way to create a newsletter tool that doesn't require the complex (for non-technical folks) task of setting up a Cron job (Which isn't even an option on shared hosts), and doesn't require the user to keep their browser open on a single page.
Any ideas suggestions or feedback is welcome. Thanks!
At a former job we wrote a daemon for a critical function in PHP, not exactly what you describe but similar enough -- certainly with loops and sleeps. We were very doubtful about its long-term stability -- specially in memory management--, so we subjected it to pretty tough stress testing. Results were excellent, and the code was put to production and running flawlessly for months if not years.
Caveats:
IIRC, PHP has a counter-based garbage
collector. This means that, unlike in
Java, two objects referencing each
other will stay in memory even if not
accessible by your program. You need
to be careful about this when you
'abandon' your objects.
Web servers
often have mechanisms to kill
long-running scripts. This may defeat
your purpose here -- specially if the
server's configuration can't be
tuned.
Related
I have some limitations with my host and my scripts can't run longer than 2 or 3 seconds. But the time it will take to finish will certainly increase as the database gets larger.
So I thought about making the script stop what it is doing and call itself after 2 seconds, for example.
Firstly I tried using cURL and then I made some attempts with wget. But there is always a problem with waiting for the response and timeouts (with cURL, for example, I just need to ping the script, not wait for a response) or permissions with the server (functions that we use to run wget such as exec seems to be disabled on my server, or something like that).
What do you think is the best idea to make a PHP script ping/call itself?
On Unix/LInux systems I would personally recommend to schedule CRON JOBS to keep running the scripts at certain intervals
May be this SO Link will help you
Php scripts generally don't call other php scripts. It is possible to spawn a background process as illustrated here, but I don't think that's what you're after. If, so you'd be better off using cron as was discussed above.
Calling a function every X amount of seconds with the same script is certainly possible, but this does the opposite of what you want since it would only extend the run time of the script in question.
What you seem to be asking is, contrary to your comment, somewhat paradoxical. A process that calls method() every so often is still a long running process and is subject to the same restrictions as any other process on the server, regardless of the fact that it may be sitting idle for short intervals.
As far as I can see your options are:
Extend the php max_execution_time directive, or have your sysadmin do so if they are willing
Revise your script so that it completes within the time limit
Move to a new server
I have about 35 cron jobs right now. Most of them are PHP scripts that either scrape or do some calculations. The scripts also loop over 10-20 different servers to do those scrapes. (They are different countries so they have to be separate calls).
So we have 30 scripts, each has a loop over 20 servers and therefore take about 5-15 minutes to run per script. I have each script spaced out right now.
But is it better to have 80 individual scripts run instead of 35 scripts that loop and take a while? Each script would take maybe 1-2 minutes instead of 10-15min.
That would of course spawn a ton more PHP processes. Is there any issue or limit with 10-15 or more PHP processes running at once?
I'm running a cloud server performance on Rackspace.
Personally if the jobs need to complete in a certain order I would make it as linear as possible.....it might take longer but I always err . The side of data accuracy.
It depends.
If you are creating more processes that will be running at the same time you are going to increase your overall memory footprint. Each process will carry it's own overhead of memory for the process to run, and to load any libraries needed for it's process. (aside from whatever it needs to do whatever it does). You will also more than have twice as many script to monitor that they are successfully running all the time.
However in creating more processes you will be able to speed things us since you are essentially creating a multi-thread. Allowing one process to continue while another is blocking waiting for i/o.
If each script doesn't have a dependency on another, breaking them into smaller scripts should be fine. If you can handle monitoring more scripts, and the server can handle it, then I would do it.
If scripts do have dependencies, or if you would have to run so many at the same time you server usage maxes out, keep them together.
That being said, I would also try to optimize the script, make sure there isn't something you can do to make them faster without create more processes.
Depending on how you have the servers setup, I would run them at once. In addition, I would also run them at night, off hours when the web servers aren't in use and not during business operations unless your web app depends on it. If you're on a Cloud server on Rackspace I wouldn't worry about bandwidth although increasing your ram could be an issue further down the road.
Spawning a ton more PHP process shouldn't be a worry if you have sufficient amount of ram; there is no limitation on the linux side.
a) Figure out which cron needs to run in which order
b) Order the cron to be run at night, around mid-night
c) Run and fireoff the 80 scripts at once
it would also be a good idea to send you an email with cron results or report that it all went through successfully, based on the batch but not individual cron.
I'm looking for some ideas to do the following. I need a PHP script to perform certain action for quite a long time. This is an extension for a CMS and this can't be anything else but PHP. It also can't be a command line script because it should be used by common people that will have only the standard means of the CMS. One of the options is having a cron job (most simple hostings have it) that will trigger the script often so that instead of working for a long time it could perform the action step by step preserving its state from one launch to the next one. This is not perfect but I can't see of any other solutions. If the script will be redirecting to itself server will interrupt it. What other options can suit?
Thanks everyone in advance!
What you're talking about is a daemon or long running program that waits for calls by client programs, performs and action, provides a response then keeps on waiting for more calls.
You might be familiar w/ these in the form of Apache & MySQL ;) Anyway PHP is generally OK in this regard, it does have the ability to function over raw sockets as well as fork sub-processes to handle multiple requests simultaneously.
Having said that PHP daemons are a tool where YMMV. Some folks will say they work great, other folks like me will say they have issues w/ interprocess communication and leaking memory even amidst plethora unset() calls.
Anyway you likely won't be able to deploy a daemon of any type on a shared hosting environment. You'll need to get a better server package or stick with a Cron based solution.
Here's a link about writing a PHP daemon.
Also, one more note. Daemons do crash from time to time and therefore you may still need to store state about whats going on, just in case someone trips over the power cord to your shared server :)
I would also suggest that you think about making it a daemon but if not then you can simply use
set_time_limit(0);
ignore_user_abort(true);
at the top to tell it not to time out and not to get interrupted by anything. Then call it from the cron to start it every day or whatever. I have this on many long processing daily tasks and it works great for me. However, it won't be able to easily talk to the outside world (other scripts can't query it or anything -- if that is what you want look into php services) so once you get it running make sure it will stop and have it print its progress to a logfile.
I am looking for the PHP equivalent for VB doevents.
I have written a realtime analysis package in VB and used doevents to release to the operating system.
Doevents allows me to stay in memory and run continuously without filling up memory and allows me to respond to user input.
I have rewritten the package in PHP and I am looking for that same doevents feature.
If it doesn't exist I could reschedule myself and exit.
But I currently don't know how to do that and I think that would add a lot more overhead.
Thank you, gerardg
usleep is what you are looking for.. Delays program execution for the given number of micro seconds
http://php.net/manual/en/function.usleep.php
It's been almost 10 years since I last wrote anything in VB and as I recall, doevents() function allowed the application to yield to the processor during intensive processing (usually to allow other system events to fire - the most common being WM_PAINT so that your UI won't appear hung).
I don't think PHP has such functionality - your script will run as a single process and end (either when it's done or when it hits the default 30 second timeout).
If you are thinking in terms of threads (as most Windows programmers tend to do) and needing to spawn more than 1 instance of your script, perhaps you should take look at PHP's Process Control functions as a start.
I'm not entirely sure which aspects of doevents you're looking to emulate, so here's pretty much everything that could be useful for you.
You can use ob_implicit_flush(true) at the top of your script to enable implicit output buffer flushing. That means that whenever your script calls echo or print or whatever you use to display stuff, PHP will automatically send it all to the user's browser. You could also just use ob_flush() after each call to display something, which acts more like Application.DoEvents() in VB with regards to keeping your UI active, but must be called each time something is output.
Naturally if your script uses the output buffer already, you could build a copy of the buffer before flushing, with ob_get_contents().
If you need to allow the script to run for more time than usual, you can set a longer tiemout with set_time_limit($time). If you need more memory, and you have access to edit your .htaccess file, place the following code and edit the value:
php_value memory_limit 64M
That sets the memory limit to 64 megabytes.
For running multiple scripts at once, you can use pcntl_exec to start another one running.
If I am missing something important about DoEvents(), let me know and I will try to help you make it work.
PHP is designed for asynchronous on demand processing. However it can be forced to become a background task with a little hackery.
As PHP is running as a single thread you do not have to worry about letting the CPU do other things as that is already taken care of. If this was not the case then a web server would only be able to serve up one page at a time and all other requests would have to sit in a queue. You will need to write some sort of look that never expires until some detectable condition happens (like the "now please exit" message you set in the DB or something).
As pointed out by others you will need to set_time_limit($something); with perhaps usleep stopping the code from running "too fast" if it eats very much CPU each loop. However if you are also using a Database connection most of your script time is actually the script waiting for the Database (by far the biggest overhead for a script).
I have seen PHP worker threads created by using screen and detatching it to a background task. Other approaches also work so long as you do not have a session that will time out or exit (say when the web browser is closed). A cron that starts a script to check if the script is running every x mins or hours gives you automatic recovery from forced exists and/or system restarts.
TL;DR: doevents is "baked in" to PHP and you don't have to worry about it.
That question may appear strange.
But every time I made PHP projects in the past, I encountered this sort of bad experience:
Scripts cancel running after 10 seconds. This results in very bad database inconsistencies (bad example for an deleting loop: User is about to delete an photo album. Album object gets deleted from database, and then half way down of deleting the photos the script gets killed right where it is, and 10.000 photos are left with no reference).
It's not transaction-safe. I've never found a way to do something securely, to ensure it's done. If script gets killed, it gets killed. Right in the middle of a loop. It gets just killed. That never happened on tomcat with java. Java runs and runs and runs, if it takes long.
Lot's of newsletter-scripts try to come around that problem by splitting the job up into a lot of packages, i.e. sending 100 at a time, then relading the page (oh man, really stupid), doing the next one, and so on. Most often something hangs or script will take longer than 10 seconds, and your platform is crippled up.
But then, I hear that very big projects use PHP like studivz (the german facebook clone, actually the biggest german website). So there is a tiny light of hope that this bad behavior just comes from unprofessional hosting companies who just kill php scripts because their servers are so bad. What's the truth about this? Can it be configured in such a way, that scripts never get killed because they take a little longer?
Is PHP suitable for very large projects?
Whenever I see a question like that, I get a bit uneasy. What does very large mean? What may be large to you, may be small to me or vice versa. And that is even assuming that we use the same metric. Are you measuring time to build the project, complete life-cycle of the project, money that are involved, number of people using it, number of developers to build/maintain it, etc. etc.
That said, the problems you're describing sounds like you don't know your technology good enough. That would be a problem for you regardless of which technology you picked. For example, use database transactions to ensure atomicity. And use asynchronous offline jobs to process long running tasks (Such as dispatching a mailing list).
A lot if the bad behaviour is covered in good frameworks like the Zend Framework.
Anything that takes longer the 10 seconds is really messed up but you can always raise the execution time with http://de3.php.net/set_time_limit
A lot of big sites are writen in PHP: Facebook, Wikipedia, StudiVZ, Digg.com etc.. a lot of the things you are talking about are just configuration things maybe you should look into that?
Are you looking for set_time_limit() and ignore_user_abort()?
Performance is not a feature you can just throw in after most of the site is done.
You have to design the site for heavy load.
If a database task is normally involving 10K rows, you should be prepared not just the execution time issues, but other maintenance questions.
Worst case: make a consistency tool to check and fix those errors.
Better: instead of phisically delete the images, just flag them and let background services to take care of the expensive maneuvers.
Best: you can utilize a job queue service and add this job to the queue.
If you do need to do transactions in php, you can just do:
mysql_query("BEGIN");
/// do your queries here
mysql_query("COMMIT");
The commit command will just complete the transaction.
If any errors occur, you can just rollback with:
mysql_query("ROLLBACK");
Edit: Note this will only work if you are using a database that supports transactions, such as InnoDB
You can configure how much time is allowed for executing a script, either in the php.ini setting or via ini_set/set_time_limit
Instead of studivz (the German Facebook clone), you could look at the actual Facebook which is entirely PHP. Or Digg. Or many Yahoo sites. Or many, many others.
ignore_user_abort is probably what you're looking for, but you could also add another layer in terms of scheduled maintenance jobs. They basically run on a specified interval and do various things to make sure your data/filesystem are in a state that you want... deleting old/unlinked files is just one of many things you can do.
For these large loops like deleting photo albums or sending 1000's of emails your looking for ignore_user_abort and set_time_limit.
Something like this:
ignore_user_abort(true); //users leaves webpage will not kill script
set_time_limit(0); //script can take as long as it wants
for(i=0;i<10000;i++)
costly_very_important_operation();
Be carefull however that this could potentially run the script forever:
ignore_user_abort(true); //users leaves webpage will not kill script
set_time_limit(0); //script can take as long as it wants
while(true)
do_something();
That script will never die, unless you restart your server.
Therefore it is best to never set the time_limit the 0.
Technically no programming language is transaction safe, it's the database that needs to be transaction safe. So if the script/code running dies or disconnects, for whatever reason, the transaction will be rolled back.
Putting queries in a loop is a very bad idea unless it is specifically design to be running in batches and breaking a much larger set into smaller pieces. Adjusting PHP timers and limits is generally a stop gap solution, you are still dependent on the client browser if using the web to kick off a script.
If I have a long process that needs to be kicked off by a browser, I "disconnect" the process from the browser and web server so control is returned to the user while the script runs. PHP scripts run from the command line can run for hours if you want. You can then use AJAX, or reload the page, to check on the progress of the long running script.
There are security concern with this code, but to "disconnect" a process from PHP running under something like Apache:
exec("nohup /usr/bin/php -f /path/to/script.php > /dev/null 2>&1 &");
But that really has nothing to do with PHP being suitable for large projects or being transaction safe. PHP can be used for large projects, but since by default there is no code that remains "resident" between hits, it can get slow if not designed right. Also, since there is no namespace support, you want to plan ahead if you have a large development team.
It's fine for a Java based system to take a few minutes to startup, initialize and load all the default objects. But this is unacceptable with PHP. PHP will take more planning for larger systems. The question is, when does the time saved in using PHP get wasted by the additional planning time required for a large system?
The reason you most likely experienced bad database consistencies in the past is because you were using the MyISAM engine for mysql (which DOES NOT support transactions). Use InnoDB instead, it supports transactions and performs row level locking.
Or use postgreSQL.
Many, many software sites are made in PHP. However, you will not hear about millions of web pages made in PHP that do not exist anymore because they were abandoned. Those pages may have burned all company money for dealing with PHP mess, or maybe they bankrupted because their soft was so crappy that customer did not want it… PHP seems good at the startup, but it does not scale very well. Yes, there are many huge web sites made in PHP, but they are rather exceptions, than a norm.