Release Queued Laravel Job without increasing Attempts count - php

Sometimes I need to release a Laravel job and have it rejoin the queue. However when doing this, the attempts count is increased. It becomes 2 and, if your queue worker is limited to 1 try, it will never be run.
How can I release without increasing attempts?
To release I am using:
$this->release(30);
Prior to this line I have tried the following code:
$payload = json_decode($this->payload, true);
if (isset($payload['attempts'])) {
$payload['attempts'] = 0;
}
$this->payload = json_encode($payload);
This does not work. The payload property is not available. It seems to be present in the Job class.
The code Laravel framework has to reset count is in the RetryCommand class. It is as follows:
protected function resetAttempts($payload)
{
$payload = json_decode($payload, true);
if (isset($payload['attempts'])) {
$payload['attempts'] = 0;
}
return json_encode($payload);
}
But I cannot work out how to access the $payload from my job class?
Is there a better way to release a job without increasing the attempt count?
I am using Laravel 5.4 and Redis queue driver.

So I just ended up deleting and requeuing a new job. Maybe not clean but it does work.
$this->delete();
$job = (new ProcessPage($this->pdf))->onQueue('converting');
dispatch($job);

Related

Where to put a Crawler script in Laravel project?

I have created a really simple PHP crawler, which I want to implement in a Laravel project. I don't know where to put it tho.. I want to start the script and just run it while the application is up.
I know that it should not be in the Controllers, or in the Cron schedule, so any suggestions where to set it up?
$homepage = 'https://example.com';
$already_crawled = [];
$crawling = [];
function follow_links($url){
global $already_crawled;
global $crawling;
$doc = new DOMDocument();
$doc->loadHTML(file_get_contents($url));
$linklist = $doc->getElementsByTagName('a');
foreach ($linklist as $link) {
$l = $link->getAttribute("href");
$full_link = 'https://example.com'.$l;
if (!in_array($full_link, $already_crawled)) {
$already_crawled[] = $full_link;
$crawling[] = $full_link;
echo $full_link.PHP_EOL;
// Insert data in the DB
}
}
array_shift($crawling);
foreach ($crawling as $link) {
follow_links($link);
}
}
follow_links($homepage);
I would recommend a combination of a Service class, Command, and possibly Jobs — and then running them from worker processes.
Your Service would be a class which contains all of the logic for crawling a page. The crawler service is then used either by an artisan command, a queued job, or a combination of both.
You are right that you don't want to run the crawler directly from the built-in Laravel scheduler (because it might run for a long time and prevent other scheduled tasks from running). However, one option is to use your Laravel schedule to run a task which checks for urls that need to be re-crawled and dispatches queued jobs to your worker processes, which are very easy to implement in Laravel.
Each new discovered url can be thought of as a separate task and queued individually for crawling, rather than running the process "continually" while the application is online.

Laravel 5.0 delayed execution

I need to delay execution of one method in Laravel 5.0, or to be more specific, I need it to be executed at special given time. The method is sending notification through GCM to mobile app, and I need to do it repeatedly and set it to a different time. As I found out, there is no way how to intentionally delay notification in GCM. I know basics from working with cron and scheduling in Laravel, but I cant find answer to my problem.
The method I need to execute with delay is this:
public function pushAndroid($receiver, $message, $data)
{
$pushManager = new PushManager(PushManager::ENVIRONMENT_DEV);
$gcmAdapter = new GcmAdapter(array(
'apiKey' => self::GCM_API_KEY
));
$androidDevicesArray = array(new Device($receiver));
$devices = new DeviceCollection($androidDevicesArray);
$msg = new GcmMessage($message, $data);
$push = new Push($gcmAdapter, $devices, $msg);
$pushManager->add($push);
$pushManager->push();
}
Information when (date+time) it should be executed is stored in table in database. And for every notification, I need to do it only once, not repeatedly.
If you take a look at https://laravel.com/docs/5.6/scheduling you can setup something that fits your needs.
Make something with the looks of
$schedule->call(function () {
// Here you get the collection for the current date and time
$notifications = YourModel::whereDate('datecolumn',\Carbon\Carbon::now());
...
})->everyMinute();
You can also use Queues with Delayed Dispatching, if this makes more sense. Since you hinted you only need to do it once.
ProcessJobClassName::dispatch($podcast)->delay(now()->addMinutes(10));

Rate limiting PHP function

I have a php function which gets called when someone visits POST www.example.com/webhook. However, the external service which I cannot control, sometimes calls this url twice in rapid succession, messing with my logic since the webhook persists stuff in the database which takes a few ms to complete.
In other words, when the second request comes in (which can not be ignored), the first request is likely not completed yet however I need this to be completed in the order it came in.
So I've created a little hack in Laravel which should "throttle" the execution with 5 seconds in between. It seems to work most of the time. However an error in my code or some other oversight, does not make this solution work everytime.
function myWebhook() {
// Check if cache value (defaults to 0) and compare with current time.
while(Cache::get('g2a_webhook_timestamp', 0) + 5 > Carbon::now()->timestamp) {
// Postpone execution.
sleep(1);
}
// Create a cache value (file storage) that stores the current
Cache::put('g2a_webhook_timestamp', Carbon::now()->timestamp, 1);
// Execute rest of code ...
}
Anyone perhaps got a watertight solution for this issue?
You have essentially designed your own simplified queue system which is the right approach but you can make use of the native Laravel queue to have a more robust solution to your problem.
Define a job, e.g: ProcessWebhook
When a POST request is received to /webhook queue the job
The laravel queue worker will process one job at a time[1] in the order they're received, ensuring that no matter how many requests are received, they'll be processed one by one and in order.
The implementation of this would look something like this:
Create a new Job, e.g: php artisan make:job ProcessWebhook
Move your webhook processing code into the handle method of the job, e.g:
public function __construct($data)
{
$this->data = $data;
}
public function handle()
{
Model::where('x', 'y')->update([
'field' => $this->data->newValue
]);
}
Modify your Webhook controller to dispatch a new job when a POST request is received, e.g:
public function webhook(Request $request)
{
$data = $request->getContent();
ProcessWebhook::dispatch($data);
}
Start your queue worker, php artisan queue:work, which will run in the background processing jobs in the order they arrive, one at a time.
That's it, a maintainable solution to processing webhooks in order, one-by-one. You can read the Queue documentation to find out more about the functionality available, including retrying failed jobs which can be very useful.
[1] Laravel will process one job at a time per worker. You can add more workers to improve queue throughput for other use cases but in this situation you'd just want to use one worker.

Doctrine Paginator fills up memory

I'm having a Symfony Command that uses the Doctrine Paginator on PHP 7.0.22. The command must process data from a large table, so I do it in chunks of 100 items. The issue is that after a few hundred loops it gets to fill 256M RAM. As measures against OOM (out-of-memory) I use:
$em->getConnection()->getConfiguration()->setSQLLogger(null); - disables the sql logger, that fills memory with logged queries for scripts running many sql commands
$em->clear(); - detaches all objects from Doctrine at the end of every loop
I've put some dumps with memory_get_usage() to check what's going on and it seems that the collector doesn't clean as much as the command adds at every $paginator->getIterator()->getArrayCopy(); call.
I've even tried to manually collect the garbage every loop with gc_collect_cycles(), but still no difference, the command starts using 18M and increases with ~2M every few hundred items. Also tried to manually unset the results and the query builder... nothing. I removed all the data processing and kept only the select query and the paginator and got the same behaviour.
Anyone has any idea where I should look next?
Note: 256M should be more than enough for this kind of operations, so please don't recommend solutions that suggest increasing allowed memory.
The striped down execute() method looks something like this:
protected function execute(InputInterface $input, OutputInterface $output)
{
// Remove SQL logger to avoid out of memory errors
$em = $this->getEntityManager(); // method defined in base class
$em->getConnection()->getConfiguration()->setSQLLogger(null);
$firstResult = 0;
// Get latest ID
$maxId = $this->getMaxIdInTable('AppBundle:MyEntity'); // method defined in base class
$this->getLogger()->info('Working for max media id: ' . $maxId);
do {
// Get data
$dbItemsQuery = $em->createQueryBuilder()
->select('m')
->from('AppBundle:MyEntity', 'm')
->where('m.id <= :maxId')
->setParameter('maxId', $maxId)
->setFirstResult($firstResult)
->setMaxResults(self::PAGE_SIZE)
;
$paginator = new Paginator($dbItemsQuery);
$dbItems = $paginator->getIterator()->getArrayCopy();
$totalCount = count($paginator);
$currentPageCount = count($dbItems);
// Clear Doctrine objects from memory
$em->clear();
// Update first result
$firstResult += $currentPageCount;
$output->writeln($firstResult);
}
while ($currentPageCount == self::PAGE_SIZE);
// Finish message
$output->writeln("\n\n<info>Done running <comment>" . $this->getName() . "</comment></info>\n");
}
The memory leak was generated by Doctrine Paginator. I replaced it with native query using Doctrine prepared statements and fixed it.
Other things that you should take into consideration:
If you are replacing the Doctrine Paginator, you should rebuild the pagination functionality, by adding a limit to your query.
Run your command with --no-debug flag or -env=prod or maybe both. The thing is that the commands are running by default in the dev environment. This enables some data collectors that are not used in the prod environment. See more on this topic in the Symfony documentation - How to Use the Console
Edit: In my particular case I was also using the bundle eightpoints/guzzle-bundle that implements the HTTP Guzzle library (had some API calls in my command). This bundle was also leaking, apparently through some middleware. To fix this, I had to instantiate the Guzzle client independently, without the EightPoints bundle.

Memory considerations for long-running php scripts

I want to write a worker for beanstalkd in php, using a Zend Framework 2 controller. It starts via the CLI and will run forever, asking for jobs from beanstalkd like this example.
In simple pseudo-like code:
while (true) {
$data = $beanstalk->reserve();
$class = $data->class;
$params = $data->params;
$job = new $class($params);
$job();
}
The $job has here an __invoke() method of course. However, some things in these jobs might be running for a long time. Some might run with a considerable amount of memory. Some might have injected the $beanstalk object, to start new jobs themselves, or have a Zend\Di\Locator instance to pull objects from the DIC.
I am worried about this setup for production environments on the long term, as perhaps circular references might occur and (at this moment) I do not explicitly "do" any garbage collection while this action might run for weeks/months/years *.
*) In beanstalk, reserve is a blocking call and if no job is available, this worker will wait until it gets any response back from beanstalk.
My question: how will php handle this on the long term and should I take any special precaution to keep this from blocking?
This I did consider and might be helpful (but please correct if I am wrong and add more if possible):
Use gc_enable() before starting the loop
Use gc_collect_cycles() in every iteration
Unset $job in every iteration
Explicitly unset references in __destruct() from a $job
(NB: Update from here)
I did run some tests with arbitrary jobs. The jobs I included were: "simple", just set a value; "longarray", create an array of 1,000 values; "producer", let the loop inject $pheanstalk and add three simplejobs to the queue (so there is now a reference from job to beanstalk); "locatoraware", where a Zend\Di\Locator is given and all job types are instantiated (though not invoked). I added 10,000 jobs to the queue, then I reserved all jobs in a queue.
Results for "simplejob" (memory consumption per 1,000 jobs, with memory_get_usage())
0: 56392
1000: 548832
2000: 1074464
3000: 1538656
4000: 2125728
5000: 2598112
6000: 3054112
7000: 3510112
8000: 4228256
9000: 4717024
10000: 5173024
Picking a random job, measuring the same as above. Distribution:
["Producer"] => int(2431)
["LongArray"] => int(2588)
["LocatorAware"] => int(2526)
["Simple"] => int(2456)
Memory:
0: 66164
1000: 810056
2000: 1569452
3000: 2258036
4000: 3083032
5000: 3791256
6000: 4480028
7000: 5163884
8000: 6107812
9000: 6824320
10000: 7518020
The execution code from above is updated to this:
$baseMemory = memory_get_usage();
gc_enable();
for ( $i = 0; $i <= 10000; $i++ ) {
$data = $bheanstalk->reserve();
$class = $data->class;
$params = $data->params;
$job = new $class($params);
$job();
$job = null;
unset($job);
if ( $i % 1000 === 0 ) {
gc_collect_cycles();
echo sprintf( '%8d: ', $i ), memory_get_usage() - $baseMemory, "<br>";
}
}
As everybody notices, the memory consumption is in php not leveraged and kept to a minimum, but increases over time.
I've usually restarted the script regularly - though you don't have to do it after every job is run (unless you want to, and it's useful to clear memory). You could for example run for up to 100 jobs or more at a time or till the script had used say 20MB RAM, and then exit the script, to be instantly re-run.
My blogpost at http://www.phpscaling.com/2009/06/23/doing-the-work-elsewhere-sidebar-running-the-worker/ has some example shell scripts of re-running the scripts.
I ended up benchmarking my current code base line for line, after which I came to this:
$job = $this->getLocator()->get($data->name, $params);
It uses the Zend\Di dependency injection which instance manager tracks instances through the complete process. So after a job was invoked and could be removed, the instance manager still kept it in memory. Not using Zend\Di for instantiating the jobs immediately resulted in a static memory consumption instead of a linear one.
For memory safety, don't use looping after each sequence job in PHP. But just create simple bash script to do looping:
while [ true ] ; do
php do_jobs.php
done
Hey there, with do_jobs.php contains something like:
// ...
$data = $beanstalk->reserve();
$class = $data->class;
$params = $data->params;
$job = new $class($params);
$job();
// ...
simple right? ;)

Categories