php time consuming script - normal POST or multiple AJAX requests? - php

I'm going to give you an abstract of the system. It has orders and every order needs to be processed by an external API. Normally around 100 200 orders need to be processed at a time so it would be a time consuming task. Currently I use normal post request with the order ids passed in and I've increased PHP resources to some extent.
The main concern: Don't know if for some reason the script uses up all the resources and stops. This is very undesired because it may stop at any time during execution, the other thing is I don't want to allocate too much resources as I know this is also a bad idea.
So I was wondering if I use multiple AJAX requests for every order, wouldn't that be better actually? It would certainly take more time overall because of making the request and instantiating the objects and stuff every time. But I will be quite sure that the allocated resources won't be used up and the script will complete successfully. It also gives me the possibility to inform the user interactively about how many orders have been processed.
Any feedback from experienced users is welcome.

If you run multiple AJAX requests, they will run in parallel so'd take less time but more resources.
In my opinion you should use AJAX because, as you say, you can inform the user of the processes and it's better to do this then have the user not knowing what is happening.
If you were on a page and it froze for 30 seconds while processing, you'd not know if the script had crashed or whatever, but if it was for 60 seconds but informing you of the progress, you'd be more inclined to wait for it to finish.
You could also pre-process orders when they are added, then finish processing when the orders were completed (depending on your order process mechanism)

An ajax call, able to handle >1 orders could be even better, without knowing
anyway the details of your system.
Continuous ajax calls are also a (server) resource.

Related

jQuery times out waiting for server to respond back

Background:
I have two pages (index.php and script.php).
I have a jQuery function that calls script.php from index.php.
script.php will do a ton of data processing and then return that data back to index.php so that it can be displayed to the user.
Problem:
index.php appears to be timing out because script.php is taking to long to finish. script.php can sometimes take up to 4 hrs to finish processing before it can return the data to index.php.
the reason I say index.php is timing out is b/c it never updates and just sits there with an hour glass even after script.php stops processing.
i know for sure that script.php does finish processing successfully b/c i'm writing the output to a log file as well and see that everything is being processed.
if there is not much data to be processed by script.php then index.php will update as it is supposed to.
I'm not setting any timeout values within the function inside index.php when calling script.php.
Is there a better to get index.php to update after waiting a very long time for script.php to finish? I'm using FireFox, so is it maybe a FireFox issue?
Do you seriously want an ajax call to take four hours to respond? That makes little sense in the normal way the web and browsers work. I'd strongly suggest a redesign.
That said, jQuery's $.ajax() call has a timeout value you can set as an option described here: http://api.jquery.com/jQuery.ajax/. I have no idea if the browser will allow it to be set as long as four hours and still operate properly. In any case, it's not a high probability operation to require keeping a browser connection open and live for four hours. If there's a momentary hiccup, what are you going to do? start all over again? This is just not a good design.
What I would suggest as a redesign is that you break the problem up into smaller pieces that can be satisfied in much shorter ajax calls. If you really want a four hour operation, then I'd suggest you start the operation with one ajax call and then you poll every few minutes from the browser to inquire when the job is done. When it is finally done, then you can retrieve the results. This would be much more compatible with the normal way that ajax calls and browsers work and it wouldn't suffer if there is a momentary internet glitch during the four hours.
If possible, your first ajax call could also return an approximation for how long the operation might take which could provide some helpful UI in the browser that is waiting for the results.
Here's a possibility:
Step 1. Send ajax call requesting that the job start. Immediately receive back a job ID and any relevant information about the estimated time for the job.
Step 2. Calculate a polling interval based on the estimated time for the job. If the estimate is four hours and the estimates are generally accurate, then you can set a timer and ask again in two hours. When asking for the data, you send the job ID returned by the first ajax call.
Step 3. As you near the estimated time of completion, you can narrow the polling interval down to perhaps a few minutes. Eventually, you will get a polling request that says the data is done and it returns the data to you. If I was designing the server, I'd cache the data on the server for some period of time in case the client has to request it again for any reason so you don't have to repeat the four hour process.
Oh, and then I'd think about changing the design on my server so that nothing that is requested regularly every takes four hours. It should either be pre-built and pre-cached in some sort of batch fashion (e.g. a couple times a day) or the whole scheme should be redone so that common queries can be satisfied in less than a minute rather than four hours.
Would it be possible to periodically send a response back to index.php just to "keep it alive" ? If not, perhaps split up your script into a few smaller scripts, and run them in chunks of an hour at a type as opposed to the 4 hours you mentioned above.

Scale multi request to different services

I have a service, where I need to ask 40 external services (API's) to get information from them, by each user request. For example one user is searching for some information and my service is asking 40 external partners to get the information, aggregates it in one DB (mysql) and displays the result to the user.
At this moment I have a multicurl solution, where I have 10 partner request at one time and if someone parnter is done with the request, then the software is adding another partner from the remaining 30 to the queue of multicurl, until all the 40 request are done and the results are in the DB.
The problem on this solution, is that it can not scale on many servers and I want to have some solution, where I can fire 40 request at one time for example divided on 2-3 servers and wait only so long, as the slowest partner delivers the results ;-) What means, that if the slowest partner tooks 10 seconds I will have the result of all 40 partners in 10 seconds. On multicurl I come in troubles, when there are more then 10-12 requests at one time.
What kind of solution, can you offer me, what i getting as low as possible ressources and can run many many process on one server and be scalable. My software is on PHP written, that mean I need an good connect to the solution with framework or API.
I hope you understand my problem and need. Please ask, if something is not clear.
One possible solution would be to use a message queue system like beanstalkd, Apache ActiveMQ, memcacheQ etc.
A high level example would be:
User makes request to your service for information
Your service adds the requests to the queue (presumably one for each of the 40 services you want to query)
One or more job servers continuously poll the queue for work
A job server gets a message from the queue to do some work, adds the data to the DB and deletes the item from the queue.
In this model, since now the one task of performing 40 requests is distributed and is no longer part of one "process", the next part of the puzzle will be figuring out how to mark a set of work as completed. This part may not be that difficult or maybe it introduces a new challenge (depends on the data and your application). Perhaps you could use another cache/db row to set a counter to the number of jobs a particular request needs in order to complete and as each queue worker finishes a request, it can reduce the counter by 1. Once the counter is 0, you know the request has been completed. But when you do that you need to make sure the counter gets to 0 and doesn't get stuck for some reason.
That's one way at least, hope that helps you a little or opens the door for more ideas.

Should I be using message queuing for this?

I have a PHP application that currently has 5k users and will keep increasing for the forseeable future. Once a week I run a script that:
fetches all the users from the database
loops through the users, and performs some upkeep for each one (this includes adding new DB records)
The last time this script ran, it only processed 1400 users before dieing due to a 30 second maximum execute time error. One solution I thought of was to have the main script still fetch all the users, but instead of performing the upkeep process itself, it would make an asynchronous cURL call (1 for each user) to a new script that will perform the upkeep for that particular user.
My concern here is that 5k+ cURL calls could bring down the server. Is this something that could be remedied by using a messaging queue instead of cURL calls? I have no experience using one, but from what I've read it seems like this might help. If so, which message queuing system would you recommend?
Some background info:
this is a Symfony project, using Doctrine as my ORM and MySQL as my DB
the server is a Windows machine, and I'm using Windows' task scheduler and wget to run this script automatically once per week.
Any advice and help is greatly appreciated.
If it's possible, I would make a scheduled task (cron job) that would run more often and use LIMIT 100 (or some other number) to process a limited number of users at a time.
A few ideas:
Increase the Script Execution time-limit - set_time_limit()
Don't go overboard, but more than 30 seconds would be a start.
Track Upkeep against Users
Maybe add a field for each user, last_check and have that field set to the date/time of the last successful "Upkeep" action performed against that user.
Process Smaller Batches
Better to run smaller batches more often. Think of it as being the PHP equivalent of "all of your eggs in more than one basket". With the last_check field above, it would be easy to identify those with the longest period since the last update, and also set a threshold for how often to process them.
Run More Often
Set a cronjob and process, say 100 records every 2 minutes or something like that.
Log and Review your Performance
Have logfiles and record stats. How many records were processed, how long was it since they were last processed, how long did the script take. These metrics will allow you to tweak the batch sizes, cronjob settings, time-limits, etc. to ensure that the maximum checks are performed in a stable fashion.
Setting all this may sound like alot of work compared to a single process, but it will allow you to handle increased user volumes, and would form a strong foundation for any further maintenance tasks you might be looking at down the track.
Why don't you still use the cURL idea, but instead of processing only one user for each, send a bunch of users to one by splitting them into groups of 1000 or something.
Have you considered changing your logic to commit changes as you process each user? It sounds like you may be running a single transaction to process all users, which may not be necessary.
How about just increasing the execution time limit of PHP?
Also, looking into if you can improve your upkeep-procedure to make it faster can help too. Depending on what exactly you are doing, you could also look into spreading it out a bit. Do a couple once in a while rather than everyone at once. But depends on what exactly you're doing of course.

Can 1330316 AJAX requests crash my server?

I'm building a small PHP/Javascript app which will do some processing for all cities in all US states. This rounds up to a total of (52 x 25583) = 1330316 or less items that will need to be processed.
The processing of each item will take about 2-3 seconds, so its possible that the user could have to stare at this page for 1-2 hours (or at least keep it minimized while he did other stuff).
In order to give the user the maximum feedback, I was thinking of controlling the processing of the page via javascript, basically something like this:
var current = 1;
var max = userItems.length; // 1330316 or less
process();
function process()
{
if (current >= max)
{
alert('done');
return;
}
$.post("http://example.com/process", {id: current}, function()
{
$("#current").html(current);
current ++;
process();
}
);
}
In the html i will have the following status message which will be updated whenever the process() function is called:
<div id="progress">
Please wait while items are processed.
<span id="current">0</span> / <span id="max">1330316</span> items have been processed.
</div>
Hopefully you can all see how I want this to work.
My only concern is that, if those 1330316 requests are made simultaneously to the server, is there a possibility that this crashes/brings down the server? If so, if I put in an extra wait of 2 seconds per request using sleep(3); in the server-side PHP code, will that make things better?
Or is there a different mechanism for showing the user the rapid feedback such as polling which doesn't require me to mess with apache or the server?
If you can place a cronjob in the server, I believe it'd work much better. What about using a cronjob to do the actual processing and use Javascript to update periodically the status (say, every 10 seconds)?
Then, the first step would be to trigger some flag that the cronjob PHP will check. If it's active, then the task must be performed (you could use some temporary file to tell the script which records must be processsed).
The cronjob would do the task and then, when its iteration is complete, turn off the flag.
This way, the user can even close your application and check it back later, and the server will handle all the processing, uninterrupted by client activity.
Putting a sleep inside your server-side php script can only make it worse. It leads to more processes sticking around, which turns out to increase parallel working/sleeping processes count, which adds up to increased memory usage.
Don't fear that so much processes can be done in parallel. Usually an apache server is configured to process no more than 150 requests in parallel. A well configured server does not process more requests in parallel than resources are available (good administrators do some calculations beforehand). The other requests have to wait - and given your count of requests it's probable that they are going to timeout before being processed.
Your concerns should however be about client-side resources but it looks like your script only starts a new request when the previous returned. BTW: Well behaving HTTP clients (which your browser should be) start no more than 6 requests in parallel to the same IP.
Update: Besides the above you should seriously consider redesigning your approach to mass-processing (similar to as #Joel suggested) - but this should go to another question.

Need advice on cron job'ing a very large process

I have a PHP script that grabs data from an external service and saves data to my database. I need this script to run once every minute for every user in the system (of which I expect to be thousands). My question is, what's the most efficient way to run this per user, per minute? At first I thought I would have a function that grabs all the user Ids from my database, iterate over the ids and perform the task for each one, but I think that as the number of users grow, this will take longer, and no longer fall within 1 minute intervals. Perhaps I should queue the user Ids, and perform the task individually for each one? In which case, I'm actually unsure of how to proceed.
Thanks in advance for any advice.
Edit
To answer Oddthinking's question:
I would like to start the processes for each user at the same time. When the process for each user completes, I want to wait 1 minute, then begin the process again. So I suppose each process for each user should be asynchronous - the process for user 1 shouldn't care about the process for user 2.
To answer sims' question:
I have no control over the external service, and the users of the external service are not the same as the users in my database. I'm afraid I don't know any other scripting languages, so I need to use PHP to do this.
Am I summarising correctly?
You want to do thousands of tasks per minute, but you are not sure if you can finish them all in time?
You need to decide what do when you start running over your schedule.
Do you keep going until you finish, and then immediately start over?
Do you keep going until you finish, then wait one minute, and then start over?
Do you abort the process, wherever it got to, and then start over?
Do you slow down the frequency (e.g. from now on, just every 2 minutes)?
Do you have two processes running at the same time, and hope that the next run will be faster (this might work if you are clearing up a backlog the first time, so the second run will run quickly.)
The answers to these questions depend on the application. Cron might not be the right tool for you depending on the answer. You might be better having a process permanently running and scheduling itself.
So, let me get this straight: You are querying an external service (what? SOAP? MYSQL?) every minute for every user in the database and storing the results in the same database. Is that correct?
It seems like a design problem.
If the users on the external service are the same as the users in your database, perhaps the two should be more closely configured. I don't know if PHP is the way to go for syncing this data. If you give more detail, we could think about another solution. If you are in control of the external service, you may want to have that service dump it's data or even write directly to the database. Some other syncing mechanism might be better.
EDIT
It seems that you are making an application that stores data for a user that can then be viewed chronologically. Otherwise you may as well just fetch the data when the user requests it.
Fetch all the user IDs in go.
Iterate over them one by one (assuming that the data being fetched is unique to each user) and (you'll have to be creative here as PHP threads do not exist AFAIK) call a process for each request as you want them all to be executed at the same time and not delayed if one user does not return data.
Said process should insert the data returned into the db as soon as it is returned.
As for cron being right for the job: As long as you have a powerful enough server that can handle thousands of the above cron jobs running simultaneously, you should be fine.
You could get creative with several PHP scripts. I'm not sure, but if every CLI call to PHP starts a new PHP process, then you could do it like that.
foreach ($users as $user)
{
shell_exec("php fetchdata.php $user");
}
This is all very heavy and you should not expect to get it done snappy with PHP. Do some tests. Don't take my word for it.
Databases are made to process BULKS of records at once. If you're processing them one-by-one, you're looking for trouble. You need to find a way to batch up your "every minute" task, so that by executing a SINGLE (complicated) query, all of the affected users' info is retrieved; then, you would do the PHP processing on the result; then, in another single query, you'd PUSH the results back into the DB.
Based on your big-picture description it sounds like you have a dead-end design. If you are able to get it working right now, it'll most likely be very fragile and it won't scale at all.
I'm guessing that if you have no control over the external service, then that external service might not be happy about getting hammered by your script like this. Have you approached them with your general plan?
Do you really need to do all users every time? Is there any sort of timestamp you can use to be more selective about which users need "updates"? Perhaps if you could describe the goal a little better we might be able to give more specific advice.
Given your clarification of wanting to run the processing of users simultaneously...
The simplest solution that jumps to mind is to have one thread per user. On Windows, threads are significantly cheaper than processes.
However, whether you use threads or processes, having thousands running at the same time is almost certainly unworkable.
Instead, have a pool of threads. The size of the pool is determined by how many threads your machine can comfortable handle at a time. I would expect numbers like 30-150 to be about as far as you might want to go, but it depends very much on the hardware's capacity, and I might be out by another order of magnitude.
Each thread would grab the next user due to be processed from a shared queue, process it, and put it back at the end of the queue, perhaps with a date before which it shouldn't be processed.
(Depending on the amount and type of processing, this might be done on a separate box to the database, to ensure the database isn't overloaded by non-database-related processing.)
This solution ensures that you are always processing as many users as you can, without overloading the machine. As the number of users increases, they are processed less frequently, but always as quickly as the hardware will allow.

Categories