I'm building a small PHP/Javascript app which will do some processing for all cities in all US states. This rounds up to a total of (52 x 25583) = 1330316 or less items that will need to be processed.
The processing of each item will take about 2-3 seconds, so its possible that the user could have to stare at this page for 1-2 hours (or at least keep it minimized while he did other stuff).
In order to give the user the maximum feedback, I was thinking of controlling the processing of the page via javascript, basically something like this:
var current = 1;
var max = userItems.length; // 1330316 or less
process();
function process()
{
if (current >= max)
{
alert('done');
return;
}
$.post("http://example.com/process", {id: current}, function()
{
$("#current").html(current);
current ++;
process();
}
);
}
In the html i will have the following status message which will be updated whenever the process() function is called:
<div id="progress">
Please wait while items are processed.
<span id="current">0</span> / <span id="max">1330316</span> items have been processed.
</div>
Hopefully you can all see how I want this to work.
My only concern is that, if those 1330316 requests are made simultaneously to the server, is there a possibility that this crashes/brings down the server? If so, if I put in an extra wait of 2 seconds per request using sleep(3); in the server-side PHP code, will that make things better?
Or is there a different mechanism for showing the user the rapid feedback such as polling which doesn't require me to mess with apache or the server?
If you can place a cronjob in the server, I believe it'd work much better. What about using a cronjob to do the actual processing and use Javascript to update periodically the status (say, every 10 seconds)?
Then, the first step would be to trigger some flag that the cronjob PHP will check. If it's active, then the task must be performed (you could use some temporary file to tell the script which records must be processsed).
The cronjob would do the task and then, when its iteration is complete, turn off the flag.
This way, the user can even close your application and check it back later, and the server will handle all the processing, uninterrupted by client activity.
Putting a sleep inside your server-side php script can only make it worse. It leads to more processes sticking around, which turns out to increase parallel working/sleeping processes count, which adds up to increased memory usage.
Don't fear that so much processes can be done in parallel. Usually an apache server is configured to process no more than 150 requests in parallel. A well configured server does not process more requests in parallel than resources are available (good administrators do some calculations beforehand). The other requests have to wait - and given your count of requests it's probable that they are going to timeout before being processed.
Your concerns should however be about client-side resources but it looks like your script only starts a new request when the previous returned. BTW: Well behaving HTTP clients (which your browser should be) start no more than 6 requests in parallel to the same IP.
Update: Besides the above you should seriously consider redesigning your approach to mass-processing (similar to as #Joel suggested) - but this should go to another question.
Related
I have an html which run a php several times (about 20).
Something like
for(i=0;i<20; i++){
$.get('script.php?id='+i,function(){...});
}
Every time the script runs, it get some content through different websites (every time a different one), so each script takes from 1 to 10 seconds to complete and give a response.
I want to run all the script simultaneously to be faster, but 2 things makes it slow: the first is on the html page, it seems that ajax requested are queued after first five, at least chrome developer said me that... (this can be fixed easily I think, I've not bothered yet to find a solution); the second thing is on php side: even if the first 5 scripts are triggered together, they are run sequentially, not even in order. I 've put some
echo microtime(true);
around the script to get wherever the script is slow and what I found out (a bit surprised) is that the time on the beginning of the script (the one which should be run at almost the same time on all script) is different: the difference is very consistent, also 10 seconds, like if the second script wait the first to end before begin. How can I have all the script be running together at the same time? Thank you.
I very frankly advise that you should not attempt anything "multi-threaded" here, "nor particularly 'fancy.'"
Design your application to have some kind of queue (it can be a simple array) of "things that need to be done." Then, issue "some n" number of initial AJAX requests: certainly no more than 3 to 5.
Now, wait for notification that each request has succeeded or failed. After somehow recording the status of the request, de-queue another request and start it.
In this way, "n requests, but no more than n requests," will be active at any one time, and yes, they will each take a different amount of time, "but nobody cares."
"JavaScript is not multi-threaded," and we have no need for it. Everything is done by events, which are handled one at a time, and that happens to work beautifully.
Your program runs until the queue is empty and all of the outstanding AJAX requests have been completed (or, have failed).
There's absolutely no advantage in "getting too fancy" when the requests that you are performing might each take several seconds to complete. There's also no particular advantage in trying to queue-up too many TCP/IP network requests. Predictable Simplicity, in this case, will (IMHO) rule the day.
Background:
I have two pages (index.php and script.php).
I have a jQuery function that calls script.php from index.php.
script.php will do a ton of data processing and then return that data back to index.php so that it can be displayed to the user.
Problem:
index.php appears to be timing out because script.php is taking to long to finish. script.php can sometimes take up to 4 hrs to finish processing before it can return the data to index.php.
the reason I say index.php is timing out is b/c it never updates and just sits there with an hour glass even after script.php stops processing.
i know for sure that script.php does finish processing successfully b/c i'm writing the output to a log file as well and see that everything is being processed.
if there is not much data to be processed by script.php then index.php will update as it is supposed to.
I'm not setting any timeout values within the function inside index.php when calling script.php.
Is there a better to get index.php to update after waiting a very long time for script.php to finish? I'm using FireFox, so is it maybe a FireFox issue?
Do you seriously want an ajax call to take four hours to respond? That makes little sense in the normal way the web and browsers work. I'd strongly suggest a redesign.
That said, jQuery's $.ajax() call has a timeout value you can set as an option described here: http://api.jquery.com/jQuery.ajax/. I have no idea if the browser will allow it to be set as long as four hours and still operate properly. In any case, it's not a high probability operation to require keeping a browser connection open and live for four hours. If there's a momentary hiccup, what are you going to do? start all over again? This is just not a good design.
What I would suggest as a redesign is that you break the problem up into smaller pieces that can be satisfied in much shorter ajax calls. If you really want a four hour operation, then I'd suggest you start the operation with one ajax call and then you poll every few minutes from the browser to inquire when the job is done. When it is finally done, then you can retrieve the results. This would be much more compatible with the normal way that ajax calls and browsers work and it wouldn't suffer if there is a momentary internet glitch during the four hours.
If possible, your first ajax call could also return an approximation for how long the operation might take which could provide some helpful UI in the browser that is waiting for the results.
Here's a possibility:
Step 1. Send ajax call requesting that the job start. Immediately receive back a job ID and any relevant information about the estimated time for the job.
Step 2. Calculate a polling interval based on the estimated time for the job. If the estimate is four hours and the estimates are generally accurate, then you can set a timer and ask again in two hours. When asking for the data, you send the job ID returned by the first ajax call.
Step 3. As you near the estimated time of completion, you can narrow the polling interval down to perhaps a few minutes. Eventually, you will get a polling request that says the data is done and it returns the data to you. If I was designing the server, I'd cache the data on the server for some period of time in case the client has to request it again for any reason so you don't have to repeat the four hour process.
Oh, and then I'd think about changing the design on my server so that nothing that is requested regularly every takes four hours. It should either be pre-built and pre-cached in some sort of batch fashion (e.g. a couple times a day) or the whole scheme should be redone so that common queries can be satisfied in less than a minute rather than four hours.
Would it be possible to periodically send a response back to index.php just to "keep it alive" ? If not, perhaps split up your script into a few smaller scripts, and run them in chunks of an hour at a type as opposed to the 4 hours you mentioned above.
I'm going to give you an abstract of the system. It has orders and every order needs to be processed by an external API. Normally around 100 200 orders need to be processed at a time so it would be a time consuming task. Currently I use normal post request with the order ids passed in and I've increased PHP resources to some extent.
The main concern: Don't know if for some reason the script uses up all the resources and stops. This is very undesired because it may stop at any time during execution, the other thing is I don't want to allocate too much resources as I know this is also a bad idea.
So I was wondering if I use multiple AJAX requests for every order, wouldn't that be better actually? It would certainly take more time overall because of making the request and instantiating the objects and stuff every time. But I will be quite sure that the allocated resources won't be used up and the script will complete successfully. It also gives me the possibility to inform the user interactively about how many orders have been processed.
Any feedback from experienced users is welcome.
If you run multiple AJAX requests, they will run in parallel so'd take less time but more resources.
In my opinion you should use AJAX because, as you say, you can inform the user of the processes and it's better to do this then have the user not knowing what is happening.
If you were on a page and it froze for 30 seconds while processing, you'd not know if the script had crashed or whatever, but if it was for 60 seconds but informing you of the progress, you'd be more inclined to wait for it to finish.
You could also pre-process orders when they are added, then finish processing when the orders were completed (depending on your order process mechanism)
An ajax call, able to handle >1 orders could be even better, without knowing
anyway the details of your system.
Continuous ajax calls are also a (server) resource.
I have a few ideas about this but here is what I need to do and just wanted some second opinions really.
I am writing a small auction site in PHP/SQL, but I have come up against a hurdle.
When an item finishes, much like eBay, I need to be able to tell that it's finished and send out the emails to who has won it and who has sold it.
The only way I can think of is to schedule a piece of code to keep checking what auctions have ended but surely there is a better way?
The solution can be in multiple parts :
A script that is launched via Cron (every 5 minutes could be good, even less...). It detects the finished auction and put them in a queue.
A script, that pretty much runs continuously, and that processes items in the queue.
Note that :
You have to ensure that an auction is still open before displaying the page ! (a simple test) That way people can't join in after it closes.
For each script, you can use PHP, or any other language
Advantages :
The cron job is very fast, low on resources, and if there are a lot of auction to process, there is no risk it will be run in parallel (and then conflicts)
The queue system ensure that your system won't crash because there is too much going on... It will process the queue as fast as possible, but if it is not fast enough, the website will continue to run. You can however end up with emails being sent hours or days after the auction is closed. But the "limit" is way more predictible, and won't crash your system.
You can extend it in the future with multithreading processing of the queue, distributed processing... This is a scalable architecture.
This architecture is fun.
Additionnal informations :
Regarding the daemon script, I doesn't have to run continuously. What you can do is : at the end of the cron job, if there are items in the queue, then it checks if the other script (processing) is running. If yes then exit. If the other script is not running, it launches it...
The daemon script gets an item out of the queue and process it. At the end, if there are still items in the queue, it processes it, else, it exits.
With this system, everything is optimal and everyone loves each other !
To check if the other script is running, you can use a file and write in it "1" or "0" (= running / not running). The first script reads it, the second writes it... You can also use the database to do it. Or you can maybe use system tools or shell command...
Please be kind to share the SQL script that query the highest bidder based on the bidding end date (how to know the bidding is over) and award the product to the highest bidder
I would setup a cron job to run every 10-20-30-60 minutes etc to send out emails and update the auction details.
If you're script is fast, running it every minute or so may be alright.
Be aware that many shared hosting will only allow you to send out a certain number of emails per hour.
Do these emails need to be sent out instantly?,
I can see 2 possible problems and goals you are trying to achive:
Visual: You want that when a user browse your website, without updating or refreshing the page, it keeps updating the page so that if an audition ends, it appears something like "Audition ended, the item goes to...".
Solution: You should use Javascript and AJAX. (I assume you are already using it for countdowns or something). Make an AJAX call every 5 seconds (could be enough) and update the content.
Pratical: You want that if an audition is ended an user cannot join it. Solution: You can do it just with PHP and mysql. You could create a fields where you store the audition start timestamp and then make a simple if (time() >= ($timestamp + $duration)) {} (where $timestamp is the start of the audition and $duration is the duration of the audition) to block possible bad users trying to do it.
I have array of mobile numbers, around 50,000. I'm trying to process and send bulk SMS to these numbers using third-party API, but the browser will freeze for some minutes. I'm looking for a better option.
Processing of the data involves checking mobile number type (e.g CDMA), assigning unique ids to all the numbers for further referencing, check for network/country unique charges, etc.
I thought of queuing the data in the database and using cron to send about 5k by batch every minute, but that will take time if there are many messages. What are my other options?
I'm using Codeigniter 2 on XAMPP server.
I would write two scripts:
File index.php:
<iframe src="job.php" frameborder="0" scrolling="no" width="1" height="1"></iframe>
<script type="text/javascript">
function progress(percent){
document.getElementById('done').innerHTML=percent+'%';
}
</script><div id="done">0%</div>
File job.php:
set_time_limit(0); // ignore php timeout
ignore_user_abort(true); // keep on going even if user pulls the plug*
while(ob_get_level())ob_end_clean(); // remove output buffers
ob_implicit_flush(true); // output stuff directly
// * This absolutely depends on whether you want the user to stop the process
// or not. For example: You might create a stop button in index.php like so:
// Stop!
// Start
// But of course, you will need that line of code commented out for this feature to work.
function progress($percent){
echo '<script type="text/javascript">parent.progress('.$percent.');</script>';
}
$total=count($mobiles);
echo '<!DOCTYPE html><html><head></head><body>'; // webkit hotfix
foreach($mobiles as $i=>$mobile){
// send sms
progress($i/$total*100);
}
progress(100);
echo '</body></html>'; // webkit hotfix
I'm assuming these numbers are in a database, if so you should add a new column titled isSent (or whatever you fancy).
This next paragraph you typed should be queued and possibly done night/weekly/whenever appropriate. Unless you have a specific reason too, it shouldn't be done in bulk on demand. You can even add a column to the db to see when it was last checked so that if a number hasn't been checked in at least X days then you can perform a check on that number on demand.
Processing of the data involves checking mobile number type (e.g CDMA), assigning unique ids to all the numbers for further referencing, check for network/country unique charges, etc.
But that still leads you back to the same question of how to do this for 50,000 numbers at once. Since you mentioned cron jobs, I'm assuming you have SSH access to your server which means you don't need a browser. These cron jobs can be executed via the command line as such:
/usr/bin/php /home/username/example.com/myscript.php
My recommendation is to process 1,000 numbers at a time every 10 minutes via cron and to time how long this takes, then save it to a DB. Since you're using a cron job, it doesn't seem like these are time-sensitive SMS messages so they can be spread out. Once you know how long it took for this script to run 50 times (50*1000 = 50k) then you can update your cron job to run more/less frequently.
$time_start = microtime(true);
set_time_limit(0);
function doSendSMS($phoneNum, $msg, $blah);
$time_end = microtime(true);
$time = $time_end - $time_start;
saveTimeRequiredToSendMessagesInDB($time);
Also, you might have noticed a set_time_limit(0), this will tell PHP to not timeout after the default 30seconds. If you are able to modify the PHP.ini file then you don't need to enter this line of code. Even if you are able to edit the PHP.ini file, I would still recommend not changing this feature since you might want other pages to time out.
http://php.net/manual/en/function.set-time-limit.php
If this isn't a one-off type of situation, consider engineering a better solution.
What you basically want is a queue that your browser-bound process can write to, and than 1-N worker processes can read from and update.
Putting work in the queue should be rather inexpensive - perhaps a bunch of simple INSERT statements to a SQL RDBMS.
Then you can have a daemon or two (or 100, distributed across multiple servers) that read from the queue and process stuff. You'll want to be careful here and avoid two workers taking on the same task, but that's not hard to code around.
So your browser-bound workflow is: click some button that causes a bunch of stuff to get added to the queue, then redirect to some "queue status" interface, where the user can watch the system chew through all their work.
A system like this is nice, because it's easy to scale horizontally quite a ways.
EDIT: Christian Sciberras' answer is going in this direction, except the browser ends up driving both sides (it adds to the queue, then drives the worker process)
Cronjob would be your best bet, I don't see why it would take any longer than doing it in the browser if your only problem at the moment is the browser timing out.
If you insist on doing it via the browser then the other solution would be doing it in batches of say 1000 and redirecting to the same script but with some reference to where it got up to last time in a $_GET variable.