I currently have a scheduled console command that runs every 5 minutes without overlap like this:
$schedule->command('crawler')
->everyFiveMinutes()
->withoutOverlapping()
->sendOutputTo('../_laravel/storage/logs/scheduler-log.txt');
So it works great, but I currently have about 220 pages that takes about 3 hours to finish in increments of 5 minutes because I just force it to crawl 10 pages at each interval since each page takes like 20-30 seconds to crawl due to various factors. Each page is a record in the database. If I end up having 10,000 pages to crawl, this method would not work because it would take more than 24 hours and each page is supposed to be re-crawled once a day.
So my vendor allows up to 10 concurrent requests (or more with higher plans), so what's the best way to run it concurrently? If I just duplicate the scheduler code, does it run the same command twice or like 10 times if I duplicated it 10 times? Any issues that would cause?
And then I need to pass on parameters to the console such as 1, 2, 3, etc... in which I could use to determine which pages to crawl? i.e. 1 would be 1-10 records, 2 would be next 11-20 records, and so on.
Using this StackOverfow answer, I think I know how to pass it along, like this:
$schedule->command('crawler --sequence=1')
But how do I read that parameter within my Command class? Does it just become a regular PHP variable, i.e. $sequence?
Better to use queue for job processing
on cron, add all jobs to queue
Run multiple queue workers, which will process jobs in parallel
Tip: It happened with us.
It might happen that job added previously is not complete, yet cron adds the same task in queue again. As queues works sequentially. To save yourself from the situation, you should in database mark when a task is completed last time, so you know when to execute the job (if it was seriously delayed)
I found this on the documentation, I hope this is what you're looking for:
Retrieving Input
While your command is executing, you will obviously need to access the
values for the arguments and options accepted by your application. To
do so, you may use the argument and option methods:
Retrieving The Value Of A Command Argument
$value = $this->argument('name');
Retrieving All Arguments
$arguments = $this->argument();
Retrieving The Value Of A Command
Option
$value = $this->option('name');
Retrieving All Options
$options = $this->option();
source
Related
PHP/MySQL/Ubuntu 14.04:
tl;dr and simple question: Is there a way to increase the limit of internal redirects to more than 10? Say..15.
I have a cronjob that checks for open jobs to run every minute. In essence, these jobs create a final PDF using FPDF/mPDF, update records, and send email alerts. The PDFs are combined from multiple sources, as well as created from dynamic HTML output on Lynx. If it were just a long logic sequence with no HTML output, I can put it all in one page. But it requires HTML output to draw the PDF.
That being said, the jobs run in sequence. For example:
Check for jobs for specific ItemID
Found a job to run (job #1), redirect to a page using Lynx and finish the job. Once finished, write the next sequential job for that same ItemID (job #2). Redirect Lynx to run the next job
And so on. This loops several times until an item is complete. There's a finite number of jobs an item needs to run. Looks like 12 redirects of the above.
I can simply stop the jobs at each run and wait for the clock to hit 1 minute. But when an item requires 6 jobs, that's 6 minutes. Running them one after the other made it in 15 seconds.
This question might be too localized (I think that's the term). I hope my simple question of "Is there a way to increase the limit past 10" is general enough and helpful if answered.
Thanks
Use LimitInternalRecursion to raise the number of possible redirects.
E.g.:
LimitInternalRecursion 20
It can be used in the server config or any virtual host, so you have to edit one of the main config files (not .htaccess). In Ubuntu it's probably /etc/apache2/apache2.conf.
I'm currently working on a browser game with a PHP backend that needs to perform certain checks at specific, changing points in the future. Cron jobs don't really cut it for me as I need precision at the level of seconds. Here's some background information:
The game is multiplayer and turn-based
On creation of a game room the game creator can specify the maximum amount of time taken per action (30 seconds - 24 hours)
Once a player performs an action, they should only have the specified amount of time to perform the next, or the turn goes to the player next in line.
For obvious reasons I can't just keep track of time through Javascript, as this would be far too easy to manipulate. I also can't schedule a cron job every minute as it may be up to 30 seconds late.
What would be the most efficient way to tackle this problem? I can't imagine querying a database every second would be very server-friendly, but it is the direction I am currently leaning towards[1].
Any help or feedback would be much appreciated!
[1]:
A user makes a move
A PHP function is called that sets 'switchTurnTime' in the MySQL table's game row to 'TIMESTAMP'
A PHP script that is always running in the background queries the table for any games where the 'switchTurnTime' has passed, switches the turn and resets the time.
You can always use a queue or daemon. This only works if you have shell access to the server.
https://stackoverflow.com/a/858924/890975
Every time you need an action to occur at a specific time, add it to a queue with a delay. I've used beanstalkd with varying levels of success.
You have lots of options this way. Here's two examples with 6 second intervals:
Use a cron job every minute to add 10 jobs, each with a delay of 6 seconds
Write a simple PHP script that runs in the background (daemon) to adds an a new job to the queue every 6 seconds
I'm going with the following approach for now, since it seems to be the easiest to implement and test, as well as deploy on different kinds of servers/ hosting, while still acting reliably.
Set up a cron job to run a PHP script every minute.
Within that script, first do a query to find candidates that will have their endtime within this minute.
Start a while-loop, that runs until 59 seconds have passed.
Inside this loop, check the remianing time for each candidate.
If teh time limit has passed, do another query on that specific candidate to ensure the endtime hasn't changed.
If it has, re-add it to the candidates queue as nescessary. If not, act accordingly (in my case: switch the turn to the next player).
Hope this will help somebody in the future, cheers!
I have 5 cron jobs running a PHP file. The PHP file checks the MySQL database for items that require processing. Since cron launches the scripts all at the same time, it seems that some of the items are processed twice, or even sometimes up to five times.
Upon SELECting the file in one of the scripts, it immediately sends an UPDATE query so that other jobs shouldn't run it again. But looks like it's still double processing.
What can I do to prevent the other scripts from processing an item that was previously selected by the other cron jobs?
This issue is called "race condition". In this case it happens due to SELECT and UPDATE, though called one after another, are not a single operation. Therefore, there is a chance that two jobs do SELECT the same job, then first does UPDATE, and then second does UPDATE. And so they proceed to run this job simultaneously.
There is a workaround, however.
You could add a field to your table containing ID of current cron job worker (if you run it all on one machine, it may be PID). In worker you do UPDATE first, trying to reserve a job for it:
UPDATE jobs
SET worker = $PID, status = 'processing'
WHERE worker IS NULL AND status = 'awaiting' LIMIT 1
Then you verify you successfully reserved a job for this worker:
SELECT * FROM jobs WHERE worker = $PID
If it did not return you a row, it means other worker was first to reserve it. You can try again from step 1 to aquire another job. If it did return a row, you do all your processing, and then final UPDATE in the end:
UPDATE jobs
SET status = 'done', worker = NULL
WHERE id = $JOB_ID
I think you have a typical problem to use semaphores. Take a look at this article:
http://www.re-cycledair.com/php-dark-arts-semaphores
The idea would be at first of each script, ask for the same semaphore and wait until it be free. Then SELECT and UPDATE the DB as you do it, free the semaphore and start the process. This is the only way you can be sure that no more than one script is reading the DB while another one is about to write on it.
I would start again. This train of thought:
it takes time to process one item. about 30 seconds. if i have five cron jobs, five items are processed in 30 seconds
This is just plain wrong and you should not write your code with this in mind.
By that logic why not make 100 cron jobs and do 100 per 30 seconds? Answer, because your server is not RoadRunner and it will fall over and fail.
You should
Rethink your problem, this is the most important as it will help with 1 and 2.
Optimise your code so that it does not take 30 seconds.
Segment your code so that each job is only doing one task at a time which will make it quicker and also ensure that you do not get this 'double processing' effect.
EDIT
Even with the new knowledge of this being on a third party server my logic still stands, do not start multiple calls that you are not in control of, in fact this is now even more important.
If you do not know what they are doing with the calls then you cannot be sure they are in the right order, when or if they are processed. So just make one call to ensure you do not get double processing.
A technical solution would be for them to improve the processing time or for you to cache the responses - but that may not be relevant to your situation.
Here's what I'm trying to accomplish in high-level pseudocode:
query db for a list of names (~100)
for each name (using php) {
query a 3rd party site for xml based on the name
parse/trim the data received
update my db with this data
Wait 15 seconds (the 3rd party site has restrictions and I can only make 4 queries / minute)
}
So this was running fine. The whole script took ~25 minutes (99% of the time was spent waiting 15 seconds after every iteration). My web host then made a change so that scripts will timeout after 70 seconds (understandable). This completely breaks my script.
I assume I need to use cronjobs or command line to accomplish this. I only understand the basic us of cronjobs. Any high level advice on how to split up this work in a cronjob? I am not sure how a cronjob could parse through a dynamic list.
cron itself has no idea of your list and what is done already, but you can use two kinds of cron-jobs.
The first cron-job - that runs for example once a day - could add your 100 items to a job queue.
The second cron-job - that runs for example once every minute in a certain period - can check if there are items in the queue, execute one (or a few) and remove it from the queue.
Note that both cron-jobs are just triggers to start a php script in this case and you have two different scripts, one to set the queue and one to process part of a queue so almost everything is still done in php.
In short, there is not much that is different. Instead of executing the script via modphp or fcgi, you are going to execute it via command line php /path/to/script.php.
Because this is a different environment than http, some things obviously don't work. Sessions, cookies, get and post variables. Output gets send to stdout instead of the browser.
You can pass arguments to your script by using $argv.
I'm new to PHP, so I need some guidance as to which would be the simplest and/or elegant solution to the following problem:
I'm working on a project which has a table with as many as 500,000 records, at user specified periods, a background task must be started which will invoke a command line application on the server that does the magic, the problem is, at each 1 minute or so, I need to check on all 500,000 records(and counting) if something needs to be done.
As the title says, it is time-critical, this means that a maximum of 1 minute delay can be allowed between the time expected by the user and the time that the task is executed, of course the less delay, the better.
Thus far, I can only think of a very dirty option, have a simple utility app that runs on the server, that at each minute, will make multiple requests to the server, example:
check records between 1 and 100,000;
check records between 100,000 and 200,000;
etc. you get the point;
and the server basically starts a task for each bulk of 100,000 records or less, but it seems to me that there must be a faster approach, something similar to facebook's notification.
Additional info:
server is Windows 2008
using apache + php
EDIT 1
users have an average of 3 tasks per day at about 6-8 hours interval
more than half of the tasks can be at least 1 time per day executed at the same time[!]
Any suggestion is highly appreciated!
The easiest approach would be using a persistent task that runs the whole time and receives notification about records that need to be processed. Then it could process them immediately or, in case it needs to be processed at a certain time, it could sleep until either that time is reached or another notification arrives.
I think I gave this question more than enough time, I will stick to a utility application(that sits on the server) that will make requests to a URL accessible only from the server's IP which will spawn a new thread for each task if multiple tasks needs to be executed at the same time, it's not really scalable but it will have to do for now.