I'm new to PHP, so I need some guidance as to which would be the simplest and/or elegant solution to the following problem:
I'm working on a project which has a table with as many as 500,000 records, at user specified periods, a background task must be started which will invoke a command line application on the server that does the magic, the problem is, at each 1 minute or so, I need to check on all 500,000 records(and counting) if something needs to be done.
As the title says, it is time-critical, this means that a maximum of 1 minute delay can be allowed between the time expected by the user and the time that the task is executed, of course the less delay, the better.
Thus far, I can only think of a very dirty option, have a simple utility app that runs on the server, that at each minute, will make multiple requests to the server, example:
check records between 1 and 100,000;
check records between 100,000 and 200,000;
etc. you get the point;
and the server basically starts a task for each bulk of 100,000 records or less, but it seems to me that there must be a faster approach, something similar to facebook's notification.
Additional info:
server is Windows 2008
using apache + php
EDIT 1
users have an average of 3 tasks per day at about 6-8 hours interval
more than half of the tasks can be at least 1 time per day executed at the same time[!]
Any suggestion is highly appreciated!
The easiest approach would be using a persistent task that runs the whole time and receives notification about records that need to be processed. Then it could process them immediately or, in case it needs to be processed at a certain time, it could sleep until either that time is reached or another notification arrives.
I think I gave this question more than enough time, I will stick to a utility application(that sits on the server) that will make requests to a URL accessible only from the server's IP which will spawn a new thread for each task if multiple tasks needs to be executed at the same time, it's not really scalable but it will have to do for now.
Related
I'm currently working on a browser game with a PHP backend that needs to perform certain checks at specific, changing points in the future. Cron jobs don't really cut it for me as I need precision at the level of seconds. Here's some background information:
The game is multiplayer and turn-based
On creation of a game room the game creator can specify the maximum amount of time taken per action (30 seconds - 24 hours)
Once a player performs an action, they should only have the specified amount of time to perform the next, or the turn goes to the player next in line.
For obvious reasons I can't just keep track of time through Javascript, as this would be far too easy to manipulate. I also can't schedule a cron job every minute as it may be up to 30 seconds late.
What would be the most efficient way to tackle this problem? I can't imagine querying a database every second would be very server-friendly, but it is the direction I am currently leaning towards[1].
Any help or feedback would be much appreciated!
[1]:
A user makes a move
A PHP function is called that sets 'switchTurnTime' in the MySQL table's game row to 'TIMESTAMP'
A PHP script that is always running in the background queries the table for any games where the 'switchTurnTime' has passed, switches the turn and resets the time.
You can always use a queue or daemon. This only works if you have shell access to the server.
https://stackoverflow.com/a/858924/890975
Every time you need an action to occur at a specific time, add it to a queue with a delay. I've used beanstalkd with varying levels of success.
You have lots of options this way. Here's two examples with 6 second intervals:
Use a cron job every minute to add 10 jobs, each with a delay of 6 seconds
Write a simple PHP script that runs in the background (daemon) to adds an a new job to the queue every 6 seconds
I'm going with the following approach for now, since it seems to be the easiest to implement and test, as well as deploy on different kinds of servers/ hosting, while still acting reliably.
Set up a cron job to run a PHP script every minute.
Within that script, first do a query to find candidates that will have their endtime within this minute.
Start a while-loop, that runs until 59 seconds have passed.
Inside this loop, check the remianing time for each candidate.
If teh time limit has passed, do another query on that specific candidate to ensure the endtime hasn't changed.
If it has, re-add it to the candidates queue as nescessary. If not, act accordingly (in my case: switch the turn to the next player).
Hope this will help somebody in the future, cheers!
We have a web app that uses IMAP to conditionally insert messages into users' mailboxes at user-defined times.
Each of these 'jobs' are stored in a MySQL DB with a timestamp for when the job should be run (may be months into the future). Jobs can be cancelled at anytime by the user.
The problem is that making IMAP connections is a slow process, and before we insert the message we often have to conditionally check whether there is a reply from someone in the inbox (or similar), which adds considerable processing overhead to each job.
We currently have a system where we have cron script running every minute or so that gets all the jobs from the DB that need delivering in the next X minutes. It then splits them up into batches of Z jobs, and for each batch performs an asynchronous POST request back to the same server with all the data for those Z jobs (in order to achieve 'fake' multithreading). The server then processes each batch of Z jobs that come in via HTTP.
The reason we use an async HTTP POST for multithreading and not something like pnctl_fork is so that we can add other servers and have them POST the data to those instead, and have them run the jobs rather than the current server.
So my question is - is there a better way to do this?
I appreciate work queues like beanstalkd are available to use, but do they fit with the model of having to run jobs at specific times?
Also, because we need to keep the jobs in the DB anyway (because we need to provide the users with a UI for managing the jobs), would adding a work queue in there somewhere actually be adding more overhead rather than reducing it?
I'm sure there are better ways to achieve what we need - any suggestions would be much appreciated!
We're using PHP for all this so a PHP-based/compatible solution is really what we are looking for.
Beanstalkd would be a reasonable way to do this. It has the concept of put-with-delay, so you can regularly fill the queue from your primary store with a message that will be able to be reserved, and run, in X seconds (time you want it to run - the time now).
The workers would then run as normal, connecting to the beanstalkd daemon and waiting for a new job to be reserved. It would also be a lot more efficient without the overhead of a HTTP connection. As an example, I used to post messages to Amazon SQS (by http). This could barely do 20 QPS at very most, but Beanstalkd accepted over a thousand per second with barely any effort.
Edited to add: You can't delete a job without knowing it's ID, though you could store that outside. OTOH, do users have to be able to delete jobs at any time up to the last minute? You don't have to put a job into the queue weeks or months in advance, and so you would still only have one DB-reader that ran every, say, 1 to 5 mins to put the next few jobs into the queue, and still have as many workers as you would need, with the efficiencies they can bring.
Ultimately, it depends on the number of DB read/writes that you are doing, and how the database server is able to handle them.
If what you are doing is not a problem now, and won't become so with additional load, then carry on.
I have a few ideas about this but here is what I need to do and just wanted some second opinions really.
I am writing a small auction site in PHP/SQL, but I have come up against a hurdle.
When an item finishes, much like eBay, I need to be able to tell that it's finished and send out the emails to who has won it and who has sold it.
The only way I can think of is to schedule a piece of code to keep checking what auctions have ended but surely there is a better way?
The solution can be in multiple parts :
A script that is launched via Cron (every 5 minutes could be good, even less...). It detects the finished auction and put them in a queue.
A script, that pretty much runs continuously, and that processes items in the queue.
Note that :
You have to ensure that an auction is still open before displaying the page ! (a simple test) That way people can't join in after it closes.
For each script, you can use PHP, or any other language
Advantages :
The cron job is very fast, low on resources, and if there are a lot of auction to process, there is no risk it will be run in parallel (and then conflicts)
The queue system ensure that your system won't crash because there is too much going on... It will process the queue as fast as possible, but if it is not fast enough, the website will continue to run. You can however end up with emails being sent hours or days after the auction is closed. But the "limit" is way more predictible, and won't crash your system.
You can extend it in the future with multithreading processing of the queue, distributed processing... This is a scalable architecture.
This architecture is fun.
Additionnal informations :
Regarding the daemon script, I doesn't have to run continuously. What you can do is : at the end of the cron job, if there are items in the queue, then it checks if the other script (processing) is running. If yes then exit. If the other script is not running, it launches it...
The daemon script gets an item out of the queue and process it. At the end, if there are still items in the queue, it processes it, else, it exits.
With this system, everything is optimal and everyone loves each other !
To check if the other script is running, you can use a file and write in it "1" or "0" (= running / not running). The first script reads it, the second writes it... You can also use the database to do it. Or you can maybe use system tools or shell command...
Please be kind to share the SQL script that query the highest bidder based on the bidding end date (how to know the bidding is over) and award the product to the highest bidder
I would setup a cron job to run every 10-20-30-60 minutes etc to send out emails and update the auction details.
If you're script is fast, running it every minute or so may be alright.
Be aware that many shared hosting will only allow you to send out a certain number of emails per hour.
Do these emails need to be sent out instantly?,
I can see 2 possible problems and goals you are trying to achive:
Visual: You want that when a user browse your website, without updating or refreshing the page, it keeps updating the page so that if an audition ends, it appears something like "Audition ended, the item goes to...".
Solution: You should use Javascript and AJAX. (I assume you are already using it for countdowns or something). Make an AJAX call every 5 seconds (could be enough) and update the content.
Pratical: You want that if an audition is ended an user cannot join it. Solution: You can do it just with PHP and mysql. You could create a fields where you store the audition start timestamp and then make a simple if (time() >= ($timestamp + $duration)) {} (where $timestamp is the start of the audition and $duration is the duration of the audition) to block possible bad users trying to do it.
I have a PHP application that currently has 5k users and will keep increasing for the forseeable future. Once a week I run a script that:
fetches all the users from the database
loops through the users, and performs some upkeep for each one (this includes adding new DB records)
The last time this script ran, it only processed 1400 users before dieing due to a 30 second maximum execute time error. One solution I thought of was to have the main script still fetch all the users, but instead of performing the upkeep process itself, it would make an asynchronous cURL call (1 for each user) to a new script that will perform the upkeep for that particular user.
My concern here is that 5k+ cURL calls could bring down the server. Is this something that could be remedied by using a messaging queue instead of cURL calls? I have no experience using one, but from what I've read it seems like this might help. If so, which message queuing system would you recommend?
Some background info:
this is a Symfony project, using Doctrine as my ORM and MySQL as my DB
the server is a Windows machine, and I'm using Windows' task scheduler and wget to run this script automatically once per week.
Any advice and help is greatly appreciated.
If it's possible, I would make a scheduled task (cron job) that would run more often and use LIMIT 100 (or some other number) to process a limited number of users at a time.
A few ideas:
Increase the Script Execution time-limit - set_time_limit()
Don't go overboard, but more than 30 seconds would be a start.
Track Upkeep against Users
Maybe add a field for each user, last_check and have that field set to the date/time of the last successful "Upkeep" action performed against that user.
Process Smaller Batches
Better to run smaller batches more often. Think of it as being the PHP equivalent of "all of your eggs in more than one basket". With the last_check field above, it would be easy to identify those with the longest period since the last update, and also set a threshold for how often to process them.
Run More Often
Set a cronjob and process, say 100 records every 2 minutes or something like that.
Log and Review your Performance
Have logfiles and record stats. How many records were processed, how long was it since they were last processed, how long did the script take. These metrics will allow you to tweak the batch sizes, cronjob settings, time-limits, etc. to ensure that the maximum checks are performed in a stable fashion.
Setting all this may sound like alot of work compared to a single process, but it will allow you to handle increased user volumes, and would form a strong foundation for any further maintenance tasks you might be looking at down the track.
Why don't you still use the cURL idea, but instead of processing only one user for each, send a bunch of users to one by splitting them into groups of 1000 or something.
Have you considered changing your logic to commit changes as you process each user? It sounds like you may be running a single transaction to process all users, which may not be necessary.
How about just increasing the execution time limit of PHP?
Also, looking into if you can improve your upkeep-procedure to make it faster can help too. Depending on what exactly you are doing, you could also look into spreading it out a bit. Do a couple once in a while rather than everyone at once. But depends on what exactly you're doing of course.
I've got a PHP script on a shared webhost that selects from ~300 'feeds' the 40 that haven't been updated in the last half hour, makes a cURL request and then delivers it to the user.
SELECT * FROM table WHERE latest_scan < NOW() - INTERVAL 30 MINUTE ORDER BY latest_scan ASC LIMIT 0, 40;
// Make cURL request and process it
I want to be able to deliver updates as fast as possible, but don't want to bog down my server or the servers I'm fetching from (it's only a handful).
How often should I run the cron job, and should I limit the number of fetches per run? To how many?
It would be a good thing to "rate" how often each feed actually changes so if something has an average time of 24 hours per change, then you just fetch is every 12 hours.
Just store #changes and #try's and pick the ones you need to check... you can run the script every minute and let some statistics do the rest!
On a shared host you might also run into script run time issues. For instance, if your script runs longer than 30 seconds the server may terminate. If this is the case for your host, you might want to do some tests/logging of how long it takes to process each feed and take that into consideration when you figure out how many feeds you should process at the same time.
Another thing I had to do to help fix this was mark the "last scan" as updated before I processed each individual request so that a problem feed would not continue to fail and be picked up for each cron run. If desired, you can update the entry again on failure and specify a reason (if known) why the failure occurred.