Offloading script function to post-response: methods and best-practices? - php

First,
the set up:
I have a script that executes several tasks after a user hits the "upload" button that sends the script the data it need. Now, this part is currently mandatory, we don't have the option at this point to cut out the upload and draw from a live source.
This section intentionally long-winded to make a point. Skip ahead if you hate that
Right now the data is parsed from a really funky source using regex, then broken down into an array. It then checks the DB for any data already in the uploaded data's date range. If the data date ranges don't already exist in the DB, it inserts the data and outputs success to the user (there is also some security checks, data source validation, and basic upload validation)... If the data does exist, the script then gets the data already in the DB, finds the differences between the two sets, deletes the old data that doesn't match, adds the new data, and then sends an email to each person affected by these changes (one email per person with all relevant changes in said email, which is a whole other step). The email addresses are pulled by means of an LDAP search as our DB has their work email but the LDAP has their personal email which ensures they get the email before they come in the next day and get caught unaware. Finally, the data-uploader is told "Changes have been made, emails have been sent." which is really all they care about.
Now I may be adding a Google Calendar API that posts the data (when it's scheduling data) to the user's Google Calendar. I would do it via their work calendar, but I thought I'd get my toes wet with Google's API before dealing with setting up a WebDav system for Exchange.
</backstory>
Now!
The practical question
At this point, pre-Google integration, the script takes at most a second and a half to run. It's pretty impressive, at least I think so (the server, not my coding). But the Google bit, in tests, is SLOOOOW. We can probably fix that, but it raises the bigger question...
What is the best way to off-load some of the work after the user has gotten confirmation that the DB has been updated? This is the part he's most concerned with and the part most critical. Email notifications and Google Calendar updates are only there for the benefit of those affected by the upload, and if there is a problem with these notifications, he'll hear about it (and then I'll hear about it) regardless of the script telling him first.
So is there a way, for example, to run a cronjob that's triggered by a script's last execution? Can PHP create cronjobs with exec() ability? Is there some normalized way of handling post-execution work that needs getting done?
Any advice on this is really appreciated. I feel like the scripts bloated-ness reflects my stage of development and the need for me to finally know how to do division-of-labor in web apps.
But I also get worried that this is not done, as user's need to know when all tasks are completed, etc. So this brings up:
The best-practices/more-subjective question
Basically, is there an idea that progress bars, real-time offloading, and other ways of keeping the user tethered to the script are --when combined with optimization of the code, of course-- the better, more-preferred method then simply saying "We're done with your part, if you need us, we'll be notifying users" etc etc.
Are there any BIG things to avoid (other than obviously not giving the user any feedback at all)?
Thanks for reading. The coding part is crucial, so don't feel obliged to cover the second part or forget to cover the coding part!

A cron job is good for this. If all you want to do when a user uploads data is say "Hey user, thanks for the data!" then this will be fine.
If you prefer a more immediate approach, then you can use exec() to start a background process. In a Linux environment it would look something like this:
exec("php /path/to/your/worker/script.php >/dev/null &");
The & part says "run me in the backgound." The >/dev/null part redirects output to a black hole. As far as handling all errors and notifying appropriate parties--this is all down to the design of your worker script.
For a more flexible cross-platform approach, check out this PHP Manual post

There are a number of ways to go about this. You could exec(), like the above says, but you could potentially run into a DoS situation if there are too many submit clicks. the pcntl extension is arguably better at managing processes like this. Check out this post to see a discussion (there are 3 parts).
You could use Javascript to send a second, ajax post that runs the appropriate worker script afterwards. By using ignore_user_abort() and sending a Content-Length, the browser can disconnect early, but your apache process will continue to run and process your data. Upside is no forkbomb potential, Downside is it will open more apache processes.
Yet another option is to use a cron in the background that looks at a process-queue table for things to do 'later' - you stick items into this table on the front end, remove them on the backend while processing (see Zend_Queue).
Yet another is to use a more distributed job framework like gearmand - which can process items on other machines.
It all depends on your overall capabilities and requirements.

Related

Send message to another PHP file

Is it possible to have one PHP file send a "message" to specific users on another PHP file?
Note that the "message" needs to be able to received by a PHP file that is already running, so simply calling the other files doesn't make a difference. The image below demonstrates what I want to do:
In this example, user 1 calls "send.php", which subsequently sends a message to the "receive.php" instances of users 1,2, and 4. Is this possible to accomplish?
Additional Information
I cannot log the messages in a central location like a file or database because I would end up querying for messages about once every 100 ms, which would probably overload the database/filesystem. I need it to be instantaneous.
Additionally, I cannot use sessions or cookies because as mentioned the message needs to be sent to several users. Finally, the receiving PHP file doesn't terminate until the user leaves the page (it's really an HTML5 eventsource file).
You can persist data across pages by using sessions or a database.
Additional Information: I cannot log the messages in a central location
like a file or database because I would end up querying for messages
about once every 100 ms, which would probably overload the
database/filesystem. I need it to be instantaneous.
This can be done, but it's not in best practices. You could go about creating a separate file for each of the messages. This is very bad practice.
In terms of your problem with a centralized database, you never know. This may be able to cope with your requests, just try it and find out.
No programmers have got everything right on their first attempt. They have tried something new, and made a mistake... Everyone makes mistakes in terms of finding the best approach to an application.
Okay, there may be ways that are a lot more lightweight, but try to use a message broker with pub-sub capabilities (called topics in JMS world). They're made especially to transfer high volumes of messages from servers to clients quickly and reliably.
An easy way to create a test system for such a setup could be to use a STOMP capable message broker (see here for a list of available solutions) and combine that with a PHP STOMP client (list available on the same page.
Be sure to read this remark concerning durable subscriptions: as your PHP scripts will most likely not be running at the exact time a message is sent, you'll need the feature that emulates a durable subscription. I don't know all the brokers described on the page mentioned above, but ActiveMQ does, and it has been around for quite some time, and has a very good reputation.
PHP doesn't work that way. It runs the script and terminates. You need to look up session, or use a database. If you want to get fancy, use jQuery and Ajax.

How to process massive data-sets and provide a live user experience

I am a programmer at an internet marketing company that primaraly makes tools. These tools have certian requirements:
They run in a browser and must work in all of them.
The user either uploads something (.csv) to process or they provide a URL and API calls are made to retrieve information about it.
They are moving around THOUSANDS of lines of data (think large databases). These tools literally run for hours, usually over night.
The user must be able to watch live as their information is processed and is presented to them.
Currently we are writing in PHP, MySQL and Ajax.
My question is how do I process LARGE quantities of data and provide a user experience as the tool is running. Currently I use a custom queue system that sends ajax calls and inserts rows into tables or data into divs.
This method is a huge pain in the ass and couldnt possibly be the correct method. Should I be using a templating system or is there a better way to refresh chunks of the page with A LOT of data. And I really mean a lot of data because we come close to maxing out PHP memory and is something we are always on the look for.
Also I would love to make it so these tools could run on the server by themselves. I mean upload a .csv and close the browser window and then have an email sent to the user when the tool is done.
Does anyone have any methods (programming standards) for me that are better than using .ajax calls? Thank you.
I wanted to update with some notes incase anyone has the same question. I am looking into the following to see which is the best solution:
SlickGrid / DataTables
GearMan
Web Socket
Ratchet
Node.js
These are in no particular order and the one I choose will be based on what works for my issue and what can be used by the rest of my department. I will update when I pick the golden framework.
First of all, you cannot handle big data via Ajax. To make users able to watch the processes live you can do this using web sockets. As you are experienced in PHP, I can suggest you Ratchet which is quite new.
On the other hand, to make calculations and store big data I would use NoSQL instead of MySQL
Since you're kind of pinched for time already, migrating to Node.js may not be time sensitive. It'll also help with the question of notifying users of when the results are ready as it can do browser notification push without polling. As it makes use of Javascript you might find some of your client-side code is reusable.
I think you can run what you need in the background with some kind of Queue manager. I use something similar with CakePHP and it lets me run time intensive processes in the background asynchronously, so the browser does not need to be open.
Another plus side for this is that it's scalable, as it's easy to increase the number of queue workers running.
Basically with PHP, you just need a cron job that runs every once in a while that starts a worker that checks a Queue database for pending tasks. If none are found it keeps running in a loop until one shows up.

How do I make a php script wait for user input

Here is my problem, for those of you who looked at the title and thought "PHP does not wait on user input because it is a server side language and therefore your problem is a client-side one," just hear me out.
I'm making a game. It is mulitplayer game, therefore multiple users. At the start of every game round the users involved in the game are prompted with a chose of things they want to do that round. Course, the round isn't to commence until everyone has made a selection on what they want to do that round.
See, that is my problem. How do I make a script wait for 'all' users to send a request (send input) before continuing execution? The server-side language is PHP. The answer wouldn't be with the client as the client is only responsible for one user and wouldn't know what the other users are doing.
Thanks.
Fundamentally, you have two choices:
Choice one, you have each script check if it has all the data it needs, and then do all the work to calculate the next move (or whatever). That is actually much harder to get right than it sounds, because you run into problems of concurrency.
Basically, that approach leads to more than one "process" - page load - trying to do the same work on the same data, and that opens the door to races where you either don't do the work at all, or where you do it twice.
Choice two, which sounds harder, is where you write another PHP script that checks to see if it has all the moves, calculates the outcome, and updates the database (or whatever) in the background.
Then, run that off a cron job, or something like that, so you only have one instance running at a time. That makes life easier: your "is everything done" script is only running once, and so you don't have to worry about races - but there might be a lag between the last move being submitted and calculating the outcome.
That approach is actually easier in the long run, because while it involves more code and more moving parts, it actually avoids the hard problems (concurrency) in return for a few more easy problems (a bit more code, using cron).
You can improve on both of those, of course, but those are the fundamental models. Locking and other coordination techniques can make "calculate in the last page" work better, but they involve you addressing the races.
Using various "background job" tools can improve the latency of the second approach, by letting you trigger the check instantly rather than just on a timer. You still have some latency, but the user doesn't see as much of it.
Really, though, you get to pick one of those two strategies and go with it.
(Also, I strongly advise that if you can, grab a framework or something where someone else already solved these problems, then use that.)
Since you can't push data out, I'd tackle this as follows:
collect the submitted information on the server in eg DB
run a client-side script to check periodically if all required information is available
on the client evaluate the response: if true, start game - if not "wait" = keep checking
so you need:
script on the server that collects the info
server script that checks if all info is available
client script that checks periodically and evaluates

How to trigger server-sent event in HTML5 ... OR : Can PHP script be aware if another script has been called?

Now that all of the browsers I like have almost full support for Server Sent Events, I wanted to try implementing it on a site I've been putting off because I hate polling. But I have initial hesitation that I was hoping I could get some help on.
Here is my use case:
User goes to a form, something time-based and competitive, in this case class registration. All things being equal, they have a list of about 30 - 40 classes they are eligible for, and in order to minimize instances of "she logged in first but he hit save first but he didn't mean to hit save but she already chose another class" etc, I want to make the form real-time, so that when someone selects an option, it goes straight into the db and anyone else viewing the form sees that it is filling up. (I'll deal with the stress of people changing their minds later).
So, in a polling scenario, I had to deal with the AJAX calls having to check on the status of 40 spots and update them and setting an interval that could potentially still create collisions.
But with Server Sent Events, I can have the listener get just the spots that need updating, which seems better, but here's where I get stuck:
Is there any risk of the listener getting overloaded? Let's say the script sends 15 messages, back-to-back, about a status change. I see vague mentions of how user agents should handle queued tasks, but it's not clear if that's for establishing a connection or handling server-sent messages
Is this basically just passing the burden of polling from the browser to the server? Does the script have to check the DB every second for changes? Is there any way for the script to be aware or notified when change has occurred? Let's assume that seat requests are sent to requests.php via ajax and that updates.php pushes events back to the browser. Is there a standard and/or clever way for updates to idle until requests has made a commit?
The only solution I can think of is for requests.php to write the committed changes to a flat file (commits.xml perhaps) and updates.php just polls the file size every half-second, thereby keeping the workload to a minimum.
Any better/smarter/more obvious solutions out there?
Polling your database for changes is not a good idea. Instead, you should do inter-process PUB/SUB on the server. To do that, you can use a message queue like RabbitMQ, ZeroMQ or Redis PUB/SUB.

PHP Background Process

I have a process users must go through on my site which can take quite a bit of time (upwards of an hour in certain cases).
I'd like to be able to have the user start the process, then be told that it is running in the background and they can leave the page and will be emailed when the process is complete. This would help avoid cases when the user gets impatient and closes the window before the process has finished.
An example of how it would ideally look is how Mailchimp handles importing contacts. You upload a CSV file of your contacts, and they then say that the contacts are currently uploading, but it can take a while so feel free to leave the page.
What would be the best way to accomplish this? I looked into Gearman, however it seems like that tool is more useful for scaling large amounts of tasks to happen quickly, not running processes in the background.
Thanks for your help.
Even it doesn't seem to be what you'd use at the first look, I think I would use Gearman, for that :
You can push tasks to it when the user does his action
It'll deal with both :
balancing tasks to several servers, if you have more than one
queuing, so no more than X tasks are executed in parallel.
No need to re-invent the wheel ;-)
You might want to take a look at creating a daemon. I'd suggestion writing the daemon in a language other than PHP (node.js maybe?), but if you already have a large(ish) code base in PHP this mightn't be desirable. Try taking a look at How to design a daemon with a MySQL DB connection.
I've been working on a library call LooPHP in PHP to allow event driven programming for PHP (often desirable for daemons). The library allows for timed events, multi-threaded listeners (when you want one event queue to be feed from >1 type of source).
If you could give us some more information on what exactly this background process does, it might be helpful.
Write out a file using the user's ID as the filename. Spawn a new process to perform whatever it is you want it to do (if what you want is to have it execute some more PHP, then you can just call PHP with the script you want to run). When that process is done, have it delete that file. If the user visits the page again, have the script check for existence of the file (since the filename is predictable based on user ID). If it exists, then you're still processing, so tell them to continue waiting. Maybe have some upper bound to wait, where if they come back and the file exists, but it's been, say, 5 hours, delete the file and let them try again.

Categories