I am currently working on a fairly large purely dynamic site that contains various information that needs to be expired. Previously i did not have to worry about expiration because it was handled on user login ( various checks would run to expire the logged in user data if needed ) but with our increase member base and inactivity of users the data within the db is getting old. Normally this would not be a problem but the old data affects the rest of the sites features/functionality ( point based system features implemented, team building features, etc. ) All data stored in the database has an expiration timer so all i have to do is soft-delete the data using a php script but i don't want to trigger this on page load ( i want to avoid slowing down the user page load )
What alternatives are available aside from cronjobs. I want to be able to setup and manage the background services through php so i don't have to edit/create crons every time i need something new added, etc.
Ideally i am looking for or trying to implement a system that will allow me to insert a db row with specific instructions ( queue a specific update ) and it will be handled on the backend. I want/need to have the data updated as soon as possible to avoid the issues we are running into now. This background processor will eventually handle larger more complex tasks like auto scheduling an on site event ( tournaments ), or auto generating brackets for these tournaments. All help is appreciated!
You could try the MySQL event scheduler, so you could initiate a SQL script which would re-run every x days:
CREATE EVENT `delete_old_data`
ON SCHEDULE schedule
ON COMPLETION NOT PRESERVE
ENABLE
DO BEGIN
-- event body, something like delete all rows older than 5 days
END;
To be honest to do more complex events like generating 'site events' it looks like you should be using a cronjob. Is there any reason you cannot use this? I am sure if you explain to your hosting provider that you need a cronjob and show them the code that you will be using it for they will enable this option on your account. (most hosts should have this enabled already)
There are several approaches, but none of them is perfect.
CRONJOBS
You've pointed that you don't want to use them. But if your main concern about cronjobs is crontab management, you may just setup one every-minute PHP cron which then may trigger different tasks with different resolutions. You can fully manage background services via PHP that way. The main disadvantage is that you cannot perform your background tasks parallely.
GEARMAN WORKERS
You may also use 3rd party software, like Gearman. Gearman allow you to write so called workers (your background services), in any language you like (PHP is supported, as well as c++, java and many others). These workers behave like ansychronous functions which may be called from anywhere of your code using gearman api. You don't have to carry about the result, simply call the function and forget. It will be done in background. Also task scheduling is a built-in gearman feature.
Another software you may use is Rabbit MQ or any other message bus system.
NATIVE PHP ASYNC FUNCTIONS
New PHP7 will bring native php asynchronous programming. You will be able to separate your PHP script and http request handling, a bit like in node.js. Your script will be able to work all the time, doing some background stuff in any way you like and handling http requests like events in another process. This is probably the best option for you, if you may wait till the release date.
My answer covers only solutions I was using personally. Possibly there are also another ways to reach your goal, so keep searching!
Related
I'm building a website in PHP and need an API to be checked on a regular basis for EACH USER individually.
In a nutshell it's a SaaS to steer a user account on another website with additional/automated options. A web based bot if you want.
So basically, I need to dynamically create a new cron job with its individual interval for each user, and these should be executed in parallel (it would take too long to put all queries into one cron job if theres a lot of users).
Also, it might become necessary that each request is done with a different IP. Reason is that it is possible that the API provider is annoyed by us and wants to block us. Since the API key is public they will most likely do it by simply blocking our IP. When we change that frequently, that should help a lot.
Is something like that possible? What would this require? Any option that doesn't get too expensive?
I thought of RabbitMQ for example, but that wouldn't quite tackle all issues and I'm wondering if there's some better/smarter solution.
Thanks!
Look at temporal.io open source project. It supports practically an unlimited number of such periodic jobs. The PHP SDK can be found here.
I have created a script that uses PDO database functions to pull in data from an external feed and insert it into a database, which some days could amount to hundreds of entries.. the page hangs until it's done and there is no real control over it, if there is an error I don't know about it until the page has loaded.
Is there a way to have a controlled insert, so that it will insert X amount, then pause a few seconds and then continue on until it is complete?
During its insert it also executes other queries so it can get quite heavy.
I'm not quite sure what I am looking so have struggled to find help on Google.
I would recommend you to use background tasks for that. Pausing your PHP script will not help you in speeding up page loading. Apache (or nginx or any other web-server) sends whole HTTP packet back to browser only when PHP script is completed.
You can use some functions related to output stream and if web-server supports chunked transfer then you can see progress while your page is loading. But for this purpose many developers use AJAX-queries. One query for one chunk of data. And store position of chunk in a session.
But as I wrote at first the better way would be using background tasks and workers. There are many ways of implementation this approach. You can use some specialized services like RabbitMQ, Gearman or something like that. And you can just write your own console application that you will start and check by cron-task.
I am a programmer at an internet marketing company that primaraly makes tools. These tools have certian requirements:
They run in a browser and must work in all of them.
The user either uploads something (.csv) to process or they provide a URL and API calls are made to retrieve information about it.
They are moving around THOUSANDS of lines of data (think large databases). These tools literally run for hours, usually over night.
The user must be able to watch live as their information is processed and is presented to them.
Currently we are writing in PHP, MySQL and Ajax.
My question is how do I process LARGE quantities of data and provide a user experience as the tool is running. Currently I use a custom queue system that sends ajax calls and inserts rows into tables or data into divs.
This method is a huge pain in the ass and couldnt possibly be the correct method. Should I be using a templating system or is there a better way to refresh chunks of the page with A LOT of data. And I really mean a lot of data because we come close to maxing out PHP memory and is something we are always on the look for.
Also I would love to make it so these tools could run on the server by themselves. I mean upload a .csv and close the browser window and then have an email sent to the user when the tool is done.
Does anyone have any methods (programming standards) for me that are better than using .ajax calls? Thank you.
I wanted to update with some notes incase anyone has the same question. I am looking into the following to see which is the best solution:
SlickGrid / DataTables
GearMan
Web Socket
Ratchet
Node.js
These are in no particular order and the one I choose will be based on what works for my issue and what can be used by the rest of my department. I will update when I pick the golden framework.
First of all, you cannot handle big data via Ajax. To make users able to watch the processes live you can do this using web sockets. As you are experienced in PHP, I can suggest you Ratchet which is quite new.
On the other hand, to make calculations and store big data I would use NoSQL instead of MySQL
Since you're kind of pinched for time already, migrating to Node.js may not be time sensitive. It'll also help with the question of notifying users of when the results are ready as it can do browser notification push without polling. As it makes use of Javascript you might find some of your client-side code is reusable.
I think you can run what you need in the background with some kind of Queue manager. I use something similar with CakePHP and it lets me run time intensive processes in the background asynchronously, so the browser does not need to be open.
Another plus side for this is that it's scalable, as it's easy to increase the number of queue workers running.
Basically with PHP, you just need a cron job that runs every once in a while that starts a worker that checks a Queue database for pending tasks. If none are found it keeps running in a loop until one shows up.
I have a php based web application that captures certain events in a database table. It also features a visualization of those captured events: a html table listing the events which is controlled by ajax.
I would like to add an optional 'live' feature: after pressing a button ('switch on') all events captured from that moment on will be inserted into the already visible table. Three things have to happen: noticing the event, fetching the events data and inserting it into the table. To keep the server load inside sane limits I do not want to poll for new events with ajax request, instead I would prefer the long polling strategy.
The problem with this is obviously that when doing a long polling ajax call the servers counterpart has to monitor for an event. Since the events are registered by php scripts there is no easy way to notice that event without polling the database for changes again. This is because the capturing action runs in another process than the observing long polling request. I looked around to find a usable mechanism for such inter process communication as I know it from rich clients under linux. Indeed there are php extensions for semaphores, shared memory or even posix. However they all only exist under linux (or unix like) systems. Though not typically the application might be used under MS-Windows systems in rare cases.
So my simple question is: is there any means that is typically available on all (most) systems that can push such events to a php script servicing the long polling ajax request ? Something without polling a file or a database constantly, since I already have an event elsewhere ?
So, the initial caveats: without doing something "special", trying to do long polling with vanilla PHP will eat up resources until you kill your server.
Here is a good basic guide to basic PHP based long polling and some of the challenges associated with going the "simple" road:
How do I implement basic "Long Polling"?
As far as doing this really cross-platform (and simple enough to start), you may need to fall back to some sort of simple internal polling - but the goal should be to ensure that this action is much lower-cost than having the client poll.
One route would be to essentially treat it like you're caching database calls (which you are at this point), and go with some standard caching approaches. Everything from APC, to memcached, to polling a file, will all likely put less load on the server than having the client set up and tear down a connection every second. Have one process place data in the correct keys, and then poll them in your script on a regular basis.
Here is a pretty good overview of a variety of caching options that might be crossplatform enough for you:
http://simas.posterous.com/php-data-caching-techniques
Once you reach the limits of this approach, you'll probably have to move onto a different server architecture anyhow.
I have a process users must go through on my site which can take quite a bit of time (upwards of an hour in certain cases).
I'd like to be able to have the user start the process, then be told that it is running in the background and they can leave the page and will be emailed when the process is complete. This would help avoid cases when the user gets impatient and closes the window before the process has finished.
An example of how it would ideally look is how Mailchimp handles importing contacts. You upload a CSV file of your contacts, and they then say that the contacts are currently uploading, but it can take a while so feel free to leave the page.
What would be the best way to accomplish this? I looked into Gearman, however it seems like that tool is more useful for scaling large amounts of tasks to happen quickly, not running processes in the background.
Thanks for your help.
Even it doesn't seem to be what you'd use at the first look, I think I would use Gearman, for that :
You can push tasks to it when the user does his action
It'll deal with both :
balancing tasks to several servers, if you have more than one
queuing, so no more than X tasks are executed in parallel.
No need to re-invent the wheel ;-)
You might want to take a look at creating a daemon. I'd suggestion writing the daemon in a language other than PHP (node.js maybe?), but if you already have a large(ish) code base in PHP this mightn't be desirable. Try taking a look at How to design a daemon with a MySQL DB connection.
I've been working on a library call LooPHP in PHP to allow event driven programming for PHP (often desirable for daemons). The library allows for timed events, multi-threaded listeners (when you want one event queue to be feed from >1 type of source).
If you could give us some more information on what exactly this background process does, it might be helpful.
Write out a file using the user's ID as the filename. Spawn a new process to perform whatever it is you want it to do (if what you want is to have it execute some more PHP, then you can just call PHP with the script you want to run). When that process is done, have it delete that file. If the user visits the page again, have the script check for existence of the file (since the filename is predictable based on user ID). If it exists, then you're still processing, so tell them to continue waiting. Maybe have some upper bound to wait, where if they come back and the file exists, but it's been, say, 5 hours, delete the file and let them try again.