I have a PHP script to pull user specific data from a 3rd party source and dump it into a local table, which I want to execute every X mins when a user is logged in, but it takes about 30 seconds to run, which I don't want the user to experience. I figured the best way of doing this would be to timestamp each successful pull, then place some code in every page footer that checks the last pull and executes the PHP script in the background if it was more than X minutes ago.
I don't want to use a cron job to do this, as there are global and session variables specific to the user that I need when executing the pull script.
I can see that popen() would allow me to run the PHP script, but I can't seem to find any information relating to whether this would be run as the visitor within their session (and therefore with access to the user specific global or session variables) or as a separate session altogether.
Will popen() solve my problem?
Am I going about this the right way or is there a better method to execute the script at intervals while the user is logged in?
Any help or advice is much appreciated, as always!
Cheers
Andy
Maybe an idea to put the session data also in a table.
That way you can easily access it from your background process. You only have to pass the session_id() or the user id as argument so the process knows which user it is currently processing.
No, because PHP still needs to wait on the process started by popen() before it can exit
Probably not, because the solution doesn't fit the architectural constraints of the system
Your reasoning for not using a cron job isn't actually sound. You may have given up on exploring that particular path too quickly and drawn yourself into a corner with trying to fit a square peg in a round hole.
The real problem you're having is figuring out how to do inter-process communication between the web-request and the PHP running from your crontab. This can be solved in a number of ways, and more easily then trying to work against PHP's architectural constraints.
For example, you can store the session ids in your database or some other storage, and access the session files from the PHP script running in your crontab, independently of the web request.
Let's say you determine that a user is logged in based on the last request they made to your server by storing this information in some data store as a timestamp, along with the current session id in the user table.
The cron job can startup every X minutes, look at the database or persistence store for any records that have an active timestamp within the given range, pull those session ids and do what's needed.
Now the question of how do you actually get this to scale effectively if the processing time takes more than 30 seconds can be solved independently as well. See this answer for more details on how to deal with distributed job managers/queues from PHP.
I would use Javascript and AJAX requests to call the script from the frontend and handle the result.
You could then set the interval in JavaScript and send an AJAX-Request each interval tick.
AJAX is what you need:
http://api.jquery.com/jquery.ajax/
The use the method "success" to do something after those 30 seconds.
Related
I'm pretty new to PHP. This is my first time actually, so I apologize in advance if the question sounds dumb.
I have a php script which fetches data from an external API and updates my database regularly. I know that the best way to do this is to use a cron job. However, I am using an infinite loop which sleeps for a particular time between each update.
Now, I need to allow the user (admin) to start / stop the script and also allow them to change the time interval between each update. The admin does this through a UI. What is the best way to do this? How do I execute the script in the background when the user clicks start and how do I stop the script?
Thanks.
I think the ideal solution would be the following:
Have the background job run as a cronjob every minute (instead of a loop which can cause memory leaks). PHP was not designed to be a daemon.
Set a DB flag in the cronjob for on/off, everytime it runs it checks if its on or off, if off it exists if on it continue.
In the UI you turn that flag on or off depending on what the admin needs.
I think that is the best way to go (and easiest).
You might want to take a look at Threading in PHP -> http://www.php.net/manual/en/class.thread.php
An other option is to either set a SESSION (which is a little tricky, since you won't catch it in the same call) or you generate a file wich tells the script to stop/exit/fail/what ever if that file exists.
I'm creating a PHP script that will allow a user to log into a website and execute database queries and do other actions that could take some time to complete. If the PHP script runs these actions and they take too long, the browser page times out on the user end and the action never completes on the server end. If I redirect the user to another page and then attempt to run the action in the PHP script, will the server run it even though the user is not on the page? Could the action still time out?
In the event of long-running server-side actions in a web application like this, a good approach is to separate the queueing of the actions (which should be handled by the web application) from the running of the actions (which should be handled by a different server-side application).
In this case it could be as simple as the web application inserting a record into a database table which indicates that User X has requested Action Y to be processed at Time Z. A back-end process (always-running daemon, scheduled script, whatever you prefer) would be constantly polling that database table to look for new entries. ("New" might be denoted by something like an "IsComplete" column in that table.) It could poll every minute, every few minutes, every hour... whatever is a comfortable balance between server performance and the responsiveness of an action beginning when it's requested.
Once the action is complete, the server-side application that ran the action would mark it as complete in the database and would store the results wherever you need them to be stored. (Another database table or set of tables? A file? etc.) The web application can check for these results whenever you need it to (such as on each page load, maybe there could be some sort of "current status" of queued actions on each page so the user can see when it's ready).
The reason for all of this is simply to keep the user-facing web application responsive. Even if you do things like increase timeouts, users' browsers may still give up. Or the users themselves may give up after staring at a blank page and a spinning cursor for too long. The user interface should always respond back to the user quickly.
You could look at using something like ignore_user_abort but that is still not ideal in my opinion. I would look at deferring these actions and running them through a message queue. PHP comes with Gearman - that is one option. Using a message queue scales well and does a better job ensuring the request actions actually get completed.
Lots on SO on the subject... Asynchronous processing or message queues in PHP (CakePHP) ...but don't use Cake :)
set_time_limit() is your friend.
If it were me, I would put a loading icon animation in the user interface telling them to wait. Then I would execute the "long process" using an asynchronous AJAX call that would then return an answer, positive or negative, that you would pass to the user through JavaScript.
Just like when you upload pictures to Facebook, you can tell the user what is going on. Very clean!
I have a PHP function that I want to make available publically on the web - but it uses a lot of server resources each time it is called.
What I'd like to happen is that a user who calls this function is forced to wait for some time, before the function is called (or, at the least, before they can call it a second time).
I'd greatly prefer this 'wait' to be enforced on the server-side, so that it can't be overridden by dubious clients.
I plan to insist that users log into an online account.
Is there an efficient way I can make the user wait, without using server resources?
Would 'sleep()' be an appropriate way to do this?
Are there any suggested problems with using sleep()?
Is there a better solution to this?
Excuse my ignorance, and thanks!
sleep would be fine if you were using PHP as a command line tool for example. For a website though, your sleep will hold the connection open. Your webserver will only have a finite number of concurrent connections, so this could be used to DOS your site.
A better - but more involved - way would be to use a job queue. Add the task to a queue which is processed by a scheduled script and update the web page using AJAX or a meta-refresh.
sleep() is a bad idea in almost all possible situations. In your case, it's bad because it keeps the connection to the client open, and most webservers have a limit of open connections.
sleep() will not help you at all. The user could just load the page twice at the same time, and the command would be executed twice right after each other.
Instead, you could save a timestamp in your database for when your function was last invoked. Then, before invoking it, you should check the database to see if a suitable amount of time has passed. If it has, invoke the function and update the timestamp in the database.
If you're planning on enforcing a user login, than the problem just got a whole lot simpler.
Have a record inn the database listing users and the last time they used your resource consuming service, and measure the time difference between then and now. If the time difference is too low, deny access and display an error message.
This is best handled at the server level. No reason to even invoke PHP for repeat requests.
Like many sites, I use Nginx and you can use it's rate-limiting to block repeat requests over a certain number. So like, three requests per IP, per hour.
I am working in a tool in PHP that processes a lot of data and takes a while to finish. I would like to keep the user updated with what is going on and the current task processed.
What is in your opinion the best way to do it? I've got some ideas but can't decide for the most effective one:
The old way: execute a small part of the script and display a page to the user with a Meta Redirect or a JavaScript timer to send a request to continue the script (like /script.php?step=2).
Sending AJAX requests constantly to read a server file that PHP keeps updating through fwrite().
Same as above but PHP updates a field in the database instead of saving a file.
Does any of those sound good? Any ideas?
Thanks!
Rather than writing to a static file you fetch with AJAX or to an extra database field, why not have another PHP script that simply returns a completion percentage for the specified task. Your page can then update the progress via a very lightweight AJAX request to said PHP script.
As for implementing this "progress" script, I could offer more advice if I had more insight as to what you mean by "processes a lot of data". If you are writing to a file, your "progress" script could simply check the file size and return the percentage complete. For more complex tasks, you might assign benchmarks to particular processes and return an estimated percentage complete based on which process has completed last or is currently running.
UPDATE
This is one suggested method to "check the progress" of an active script which is simply waiting for a response from a request. I have a data mining application that I use a similar method for.
In your script that makes the request you're waiting for (the script you want to check the progress of), you can store (either in a file or a database, I use a database as I have hundreds of processes running at any time which all need to track their progress, and I have another script that allows me to monitor progress of these processes) a progress variable for the process. When the process begins, set this to 1. You can easily select an arbitrary number of 'checkpoints' the script will pass and calculate the percentage given the current checkpoint. For a large request, however, you might be more interested in knowing the approximate percent the request has completed. One possible solution would be to know the size of the returned content and set your status variable according to the percentage received at any moment. I.e. if you receive the request data in a loop, each iteration you could update the status. Or if you are downloading to a flat file you could poll the size of the file. This could be done less accurately with time (rather than file size) if you know the approximate time the request should take to complete and simply compare against the script's current execution time. Obviously neither of these are perfect solutions, but I hope they'll give you some insight into your options.
I suggest using the AJAX method, but not using a file or a database. You could probably use session values or something like that, that way you don't have to create a connection or open a file to do anything.
In the past, I've just written messages out to the page and used flush() to flush the output buffer. Very simple, but it may not work correctly on every web server or with every web browser (as they may do their own internal buffering).
Personally, I like your second option the best. Should be reliable and fairly simple to implement.
I like option 2 - using AJAX to read a status file that PHP writes to periodically. This opens up a lot of different presentation options. If you write a JSON object to the file, you can easily parse it and display things like a progress bar, status messages, etc...
A 'dirty' but quick-and-easy approach is to just echo out the status as the script runs along. So long as you don't have output buffering on, the browser will render the HTML as it receives it from the server (I know WordPress uses this technique for it's auto-upgrade).
But yes, a 'better' approach would be AJAX, though I wouldn't say there's anything wrong with 'breaking it up' use redirects.
Why not incorporate 1 & 2, where AJAX sends a request to script.php?step=1, checks response, writes to the browser, then goes back for more at script.php?step=2 and so on?
if you can do away with IE then use server sent events. its the ideal solution.
I've a particularly long operation that is going to get run when a
user presses a button on an interface and I'm wondering what would be the best
way to indicate this back to the client.
The operation is populating a fact table for a number of years worth of data
which will take roughly 20 minutes so I'm not intending the interface to be
synchronous. Even though it is generating large quantities of data server side,
I'd still like everything to remain responsive since the data for the month the
user is currently viewing will be updated fairly quickly which isn't a problem.
I thought about setting a session variable after the operation has completed
and polling for that session variable. Is this a feasible way to do such a
thing? However, I'm particularly concerned
about the user navigating away/closing their browser and then all status
about the long running job is lost.
Would it be better to perhaps insert a record somewhere lodging the processing record when it has started and finished. Then create some other sort of interface so the user (or users) can monitor the jobs that are currently executing/finished/failed?
Has anyone any resources I could look at?
How'd you do it?
The server side portion of code should spawn or communicate with a process that lives outside the web server. Using web page code to run tasks that should be handled by a daemon is just sloppy work.
You can't expect them to hang around for 20 minutes. Even the most cooperative users in the world are bound to go off and do something else, forget, and close the window. Allowing such long connection times screws up any chance of a sensible HTTP timeout and leaves you open to trivial DOS too.
As Spencer suggests, use the first request to start a process which is independent of the http request, pass an id back in the AJAX response, store the id in the session or in a DB against that user, or whatever you want. The user can then do whatever they want and it won't interrupt the task. The id can be used to poll for status. If you save it to a DB, the user can log off, clear their cookies, and when they log back in you will still be able to retrieve the status of the task.
Session are not that realible, I would probably design some sort of tasks list. So I can keep records of tasks per user. With this design I will be able to show "done" tasks, to keep user aware.
Also I will move long operation out of the worker process. This is required because web-servers could be restrated.
And, yes, I will request status every dozens of seconds from server with ajax calls.
You can have JS timer that periodically pings your server to see if any jobs are done. If user goes away and comes back you restart the timer. When job is done you indicate that to the user so they can click on the link and open the report (I would not recommend forcefully load something though it can be done)
From my experience the best way to do this is saving on the server side which reports are running for each users, and their statuses. The client would then poll this status periodically.
Basically, instead of checkStatusOf(int session), have the client ask the server of getRunningJobsFor(int userId) returning all running jobs and statuses.