I have a PHP app on Heroku, the app does lots of communication with external APIs, which in turn trigger jobs on the database, the results are then displayed in a Facebook app...
Currently I have 2 worker processes and a web process. The web process triggers the workers and monitors database flags to know when each worker job is complete...I know...this setup isn't great, ideally I'd like to get notified in my web process when each worker process is finished, but this doesn't seem to be possible...
Is there a better way to approach this in Heroku using PHP?
Maybe a PHP app on Heroku isn't the best solution, but I've written lots of PHP that I'd rather not re-write....
Thanks in advance...
I can think of two relatively straightforward things you can do without ditching PHP (though I have to mention that PHP doesn't have much to recommend it, and you would likely be better off with Python/Django, Python/Flask or Ruby/Rails):
One is that you can switch to Redis for managing your workers instead of using your database. The advantage to this is that Redis has a pub/sub system where you can subscribe to signals while you hold a connection open. This means that if a connection is open, for instance from a web process, you will be notified of a change immediately without having to poll.
Two is that you can switch to using ajax so that you don't block the loading of your page while you're waiting. Load your page immediately and then use javascript to hit a separate PHP page to periodically check for updates on the status of your job, and then use javascript to render the results on the page in place when the results are available.
Even better, use ajax long polling. Render your page immediately and then use javascript to send a request back. Then when your php page receives the second request, register a subscription with Redis and then also manually check for updates (if you're not using Redis, just check for updates). If there are no updates, then just wait until the subscription receives a message, or wait for 30 seconds, whichever. (To be honest, I've never done Redis subscriptions in PHP so I'm not sure how implement that -- if you can't do it easily then just poll every couple of seconds instead.) If the 30 second timer expires, return json that says there are no results and have the javascript retry immediately. If you do receive results within that time, return the results and have the javascript render them.
Related
Hey guys I’m working on a website for my small startup that needs to check a database continuously for new data. I'm a mechanical engineer and don't have experience with web design and web communication. Currently I’m using an AJAX request every second to check a MYSQL database (using PHP). The code compares the received data (in JSON format) and if it’s different than the previous one it triggers a function to process the new data and update the UI.
Just last night I learned about web workers, web sockets and long polling and kinda overwhelmed with all the new options I have now. I’m really confused about whether I need to change my current solution and which solution would be the best. I thought maybe I should create a dedicated web worker that handles the AJAX calls in order to avoid sacrificing UI smoothness (the website should run smoothly on an average tablet).
Anyone with experience can give me some tips and directions? I learned about Pusher API but I would like to avoid API’s for now. I feel like all the code that I have written in the past few months are inefficient after reading about web workers and web sockets…
Thanks in advance...
You should really use google and search SO for previous posts on similar issues.
Here are a few good starters:
In what situations would AJAX long/short polling be preferred over HTML5 WebSockets?
Performance of AJAX vs Websocket REST over HTTP 2.0?
What are Long-Polling, Websockets, Server-Sent Events (SSE) and Comet?
Design/Architecture: web-socket one connection vs multiple connections
Or (outside SO):
http://dsheiko.com/weblog/websockets-vs-sse-vs-long-polling/
https://www.pubnub.com/blog/2015-01-05-websockets-vs-rest-api-understanding-the-difference/
As a quick summation:
I would probably opt for a web socket connection per client.
I would avoid polling the MySQL database (why do that?). There's really no need to waste resources. It's easier to add code to the update gateway, so that whenever the DB is updated, an event is scheduled for all listening sockets... I would consider Redis for Pub/Sub if I were using more than one process / machine for my server app.
An easier workflow would look like this:
Browser page load -> Websocket connection.
Websocket connection -> subscribe (listen to) Redis channel.
SQL update -> (triggers) Redis publish to a channel.
Redis channel publish -> notification to the (subscribed) websocket.
Notification on channel -> web socket message to client.
Good luck.
Here's a simple push idea that may work for you:
Create a trigger that writes to another table when inserts/updates are done and log any relevant data there (something useful to you)
On initial load of app, get the latest updates from the secondary "log" table, store the row/event ID for comparison later
Create a poller (server-sent events) that listens to a specific script that watches said "log" table
Create a CRON job to execute the script from step 3 every X amount of time
(caveat: #3 may not work in IE, so you'd need a fallback or different solution)
I have this scenario:
User submits a link to my PHP website and closes the browser. Now that the server has got the link it will analyse the submitted link (page) for the broken links and after it has completely analysed the posted link, it will send an email to the user. I have a complete understanding of the second part i.e. how to analyse the page for the broken links and send the mail to the user. Only problem that I have is how may I achieve this first part i.e. make the server keep running the actions on it's own even even if there is no request made by the client end?
I have learned that "Crontab" or a "fork" may work for me. What do you say about these? Is it possible to achieve what I want, using these? What are the alternatives?
crontab would be the way to go for something like this.
Essentially you have two applications:
A web site where users submit data to a database.
An offline script, scheduled to run via cron, which checks for records in the database and performs the analysis, sending notifications of the results when complete.
Both of these applications share the same database, but are otherwise oblivious to each other.
A website itself isn't suited well for this sort of offline work, it's mainly a request/response system. But a scheduled task works for this. Unless the user is expecting an immediate response, a small delay of waiting for the next scheduled run of the offline task is fine.
The server should run the script independently of the browser. Once the request is submitted, the php server runs the script and returns the result to the browser (if it has a result to return)
An alternative would be to add the request to a database and then use crontab run the php script at a given interval. The script would then check the database to see if there's anything that needs to be processed. You could limit the script to run one database entry every minute (or whatever works). This will help prevent performance problems if you have a lot of requests at once, but will be slower to send the email.
A typical approach would be to enter the link into a database when the user submits it. You would then use a cron job to execute a script periodically, which will process any pending links.
Exactly how to setup a cron job (or equivalent scheduled task) depends on your server. If you have a host which provides a web-based admin tool (such as CPanel), there will often be a way to do it in there.
PHP script will keep running after the client closes the broser (terminating the connection).
Only keep in mind PHP scripts maximum execution time is limited to "max_execution_time" directive value.
Of course here I suppose the link submission happens calling your script page... I don't understand if this is your use case...
For the sake of simplicity, a cronjob could do the wonders. User submits a link, the web handler simply saves the link into a DB (let me pretend here that the table is named "queued_links"). Then a cronjob scheduled to run each minute (for example), selects every link from queued_links, does the application logic (finds broken page links) and sends the email. It then also deletes the link from queued_links (or updates a flag to represent the fact that the link has already been processed.
In the sake of scale and speed, a cronjob wouldn't fit as well as a Message Queue (see rabbitmq, activemq, gearman, and beanstalkd (gearman and beanstalk are my favorite 2, simple and fit well with php)). In lieu of spawning a cronjob every minute, a queue processor listens for 'events' and asynchronously processes the 'events' (think 'onLinkSubmission($link)'), and processes the messages ASAP. The cronjob solution is just a simplified implementation of one of these MQ solutions, will result in better / more predictable results, but at the cost of adding new services to maintain, etc.
well, there are couple of ways, simplest of them would be:
When user submit a request, save this request some where, let's call it jobs table, and inform customer that his request has been received, they'll be updated site finish processing your request, or whatever suites you.
Now, create a (or multiple) scripts (depending upon requirement) and run this script from Cron, this script will pick requests from Job table, process it, do whatever required.
Alternatively, you can evaluate possibility of message_queue or may be using a Job server for this.
so, it all depends on your requirement.
I have a web application written in PHP using a Postgres database.
The next phase of development is for background batch processes to be built that will need to be executed once a day (or adhoc as requested) for each user of the app. The process will query, await response and process the response from third-party services to feed information into the user's account within the web application.
Are there good ways to do this?
How would batches be triggered every day at 3am for each user?
Given there could be a delay in the response, is this a good scenario to use something like node.js?
Is it best to have the output of the batch process directly update the web application's database
with the appropriate data?
Or, is there some other way to handle the output?
Update: The process doesn't have to run at 3am. The key is that a few batch processes may need to run for each user. The execution of batches could be spread throughout the day.. I want this to be a "background" process separate to the app.
You could write a PHP script that runs through any users that need to be processed, and set up a cron job to run your script at 3am. Running as a cron job means you don't need to worry so much about how slow the third party call is. Obviously you'd need to store any necessary data in the database.
Alternatively, if the process is triggered by the user doing something on the site, you could use exec() to trigger the PHP script to process just that user, right away, without the user having to wait. The risk with this is that you can't control how rapidly the process is triggered.
Third option is to just do the request live and make the user wait. But it sounds like this is not an option for you.
It really depends on what third party you're calling and why. How long does the third party take to respond, how reliable they are, what kind of rate limits they might enforce, etc...
First things first, I'm aware of this question:
Gearman: Sending data from a background worker to the client
What I want to know, is it still the case with Gearman? I'm planning on sending a batch of image URLs from a PHP web application to the gearman worker (also written in PHP; let's call it "The Main Worker") for processing asynchronously. This worker will then submit a separate task for each image to lower-tier workers (via addTask()), call runTasks() and wait for the tasks to finish, while listening to exceptions, accumulating error messages and updating the overall job status.
While I'm perfectly ok with retrieving the overall status from the Main Worker using jobStatus() calls, then just say that all of the images were processed when [false, false, 0, 0] is returned, I definitely need to be able to inform the users that some of the images couldn't be retrieved from their respective URLs or stored on the server.
I suppose I could always just store the custom data in memcache, then retrieve it from the web app, but it just seems "dirtier" to me...
I'm not trying to get any result, because from what I've seen in the manual on php.net, even the exception handling can only be done when the task is submitted synchronously, not mentioning the custom data retrieval. I just hoped that there could be something I'm missing.
I'm I remember correctly, we're using Ubuntu Server 12.04 with libgearman6 (v 0.27) and PHP 5.3.10. The version of the gearman extension is 1.0.2. I think the database is irrelevant here, as I will not be using it in either of the workers. And I think we're not using persistent queues right now.
Since gearman won't keep any task information in memory after a task has finished (just report it back for a synchronous task), you won't be able to retrieve it in your web application without storing it in a 3rd party location. We usually use a simple web service in the application for this, letting the worker call back to the application when a task has completed or an error has occured. This allows us to keep the business logic about what we'd like to do when such an error happens in the application where it belongs, and let our workers be more general (we might need image resizing in many apps, but some apps might want to start several sub tasks that depend on the image resizing being done first).
As you write, you may also let the worker write directly to the database with the state of the task or to memcached, but I've found that letting the application itself handle the logic instead of having to change and special case the workers work better. It's also well suited for a worker framework letting you keep the same standardized way of handling callback across actual worker code.
I've got a small php web app I put together to automate some manual processes that were tedious and time consuming. The app is pretty much a GUI that ssh's out and "installs" software to target machines based off of atomic change #'s from source control (perforce if it matters). The app currently kicks off each installation in a new popup window. So, say I'm installing software to 10 different machines, I get 10 different pop ups. This is getting to be too much. What are my options for kicking these processes off and displaying the results back on one page?
I was thinking I could have one popup that dynamically created divs for every installation I was kicking off, and do an ajax call for each one then display the output for each install in the corresponding div. The only problem is, I don't know how I can kick these processes off in parallel. It'll take way too long if I have to wait for each one to go out, do it's thing, and spit the results back. I'm using jQuery if it helps, but I'm looking mainly for high level architecture ideas atm. Code examples are welcome, but psuedo code is just fine.
I don't know how advanced you are or even if you have root access to your server which would be required, but this is one possible way.. it uses several different technologies, and would probably be suited for a large scale application rather than a small. But I'll advise you on it anyway.
Following technologies/stacks are used (in addition to PHP as you mentioned):
WebSockets (on top of node.js)
JSON-RPC Server (within node.js)
Gearman
What you would do, is from your client (so via JavaScript), when the page loads, a connection is made to node.js via WebSockets ) you can use something like socket.io for this).
Then when you decide that you want to do a task, (which might take a long time...) you send a request to your server, this might be some JSON encoded raw body, or it might just be a simple GET /do/something. What is important is what happens next.
On your server, when the job is received, you kick off a new job to Gearman, by adding a Task to your server. This then processes your task, and it will be a non blocking request, so you can respond immediately back to the client who made the request saying "hey we are processing your job".
Then, your server with all of your Gearman workers, receives the job, and starts processing it. This might take 5 minutes lets say for arguments sake. Once it has finished, the worker then makes a JSON encoded message which it sends to your node.js server which receives it via JSON-RPC.
After it grabs the message, it can then emit the event to any connections which need to know about it via websockets.
I needed something like this for a project once and managed to learn the basics of node.js in a day (having already a strong JS background). The second day I was complete with a full push/pull messaging job notification platform.