Laravel Queues - Passing Data to the Queue - php

I have an array containing about ~8,000 stock tickers that I'm trying to queue up; the queue is meant to receive the array of stock tickers ($symbols[]), and then pass each one to a worker / consumer (whichever jargon you prefer).
Here's what my QueueController current looks like:
Class QueueController extends \BaseController {
public function stocks()
{
$symbols = $this->select_symbols();
Queue::push('StockQueue', array('symbols' => $symbols));
}
...
}
From my QueueController, I'm calling a method to retrieve the list of stock symbols and passing it to the StockQueue Class as the $data.
public function fire($job, $data)
{
$symbols = $data; // print_r shows all symbols...
// Get Quote Data for Symbol
$quote = $this->yql_get_quote($symbol);
// Get Key Stats for Symbol
$keystats = $this->yql_get_keystats($symbol);
// Merge Quote and Keystats into an Array
$array[] = $quote;
$array[] = $keystats;
// Save Data to DB
$this->yql_save_results($array, $symbol);
$job->delete();
}
This is not what I'm trying to achieve though; what I need to do is pass in each symbol, one by one, to the StockQueue Class, and have it process it as a task.
If I were to wrap the StockQueue->stocks() method in a while loop, it would try and pass all ~8,000 in (from what I understand) immediately to the queue. Would this be detrimental or is this the best way to do it? I haven't been able to find a lot of examples for PHP-based RPC Message Queuing online, so I'm just as curious about the best practices as I am on the correct process.
With that being said, how can I fire up multiple workers for this queue? Say, I want 5 workers (depending on how many resources each one takes; I'll figure that out) to process these tasks in order to reduce the processing time by ~4/5ths. How would I do that?
Would I just launch php artisan queue:listen five times?
And, for clarity, I'm using beanstalkd and supervisord to do the message queue / monitoring.
I look forward to your advice and insight.

Yep, just run more workers. Beanstalkd can hold a number of connections open from lots of workers and make sure they all get different jobs. Just make sure that the job completes successfully (if not, deal with it appropriately - or at least bury it to look at later) and give it enough to complete, with some to spare in the TTR (Time To Run) setting.
As for how to run more jobs - yes, just increase the number of jobs available in Supervisord (numprocs=5 in the [program:NAME] section) and have them start. I tended to have another (larger) pool of the same jobs, that don't start automatically, so I could start a couple more manually through the Supervisord control, as required.

Related

Laravel wait for a specific queued listener to finish then return

I have an Event with a bunch of queued listeners. I Can't run sync because I am calling external APIs etc
Events\Invoice\InvoiceEvent::class => [
Listeners\Invoice\Listener1::class, // should queue
Listeners\Invoice\Listener2::class, // should queue
Listeners\Invoice\Listener3::class, // Should NOT queue......
Listeners\Invoice\Listener4::class, // should queue
Listeners\Invoice\Listener5::class, // should queue
],
Calling this event from a controller method.
public function store(Request $request)
{
$invoice = Invoice::findOrFail($request->id);
InvoiceEvent::dispatch($invoice); // Async event, it cannot be sync
return $invoice; // need to return only when Listener3 finish execution
}
return $invoice is dependent on Listener3, otherwise, it will return incomplete data.
How can I return only when Listener3 is finished executing?
I came up with sleep(10); but it's not an ideal solution.
Listener3 saves data from third-party API to the invoices table which needs to be returned, that's why cannot return incomplete invoice data, right now the required data gets added to the invoice but after its return
PHP is natively synchronous. Unless you're pushing those events or listeners to a queue, (i.e. class Listener3 implements ShouldQueue) then they should run in order. However, you might want to rethink the structure of your code.
Listeners are best as reactions to an event, i.e. side effects, running independently from the rest of your application. Jobs, Events and Listeners should not generally return a value (except to halt a series of listeners). In your case, the Invoice is going through multiple steps, including calling a 3rd party API. Ideas:
Create a service class that performs the tasks on the Invoice and returns the invoice to the controller when completed (which then will return the $invoice data to the front end)
If you want the process to be async, consider using a push notification. Dispatch a job which performs the tasks on the Invoice, then alert the front end (e.g. Pusher ) when the invoice is ready to fetch.
There are a way to broadcast your event not queuing it,
into your event class add:
use Illuminate\Contracts\Broadcasting\ShouldBroadcastNow;
so your class declaration must implements ShouldBroadcastNow
class NotQueueEvent implements ShouldBroadcastNow { ... }
this spread event without enqueue.
If you want to wait to this method return, you should only don't put it on the queue. Run the event and await the return. Don't know if I understood correctly the problem.

Laravel - Queue Jobs - How to get a job based on the argument passed to it when dispatching

I have multiple records in my accounts table. and I have one job let's call it MasterJob triggered every 5 minutes minutes. this job handle function will loop through my accounts and dispatch jobA for each of those accounts. eg: dispatch JobA($accountId1), dispatch JobA($accountId2)..etc.
here is a code sample:
class MasterJob implements ShouldQueue
{
public function handle()
{
$accounts = Account::all();
foreach($accounts as $account)
{
JobA::dispatch($account);
//$account has all the details of the record including id and other details
}
}
}
The problem comes when JobA(1) takes more than 5minutes to process. in this case, the MasterJob is triggered again and therefore it will dispatch another JobA($accountId1) even though the previous one isn't fully processed.
What i want to achieve is: don't dispatch only those that are still processed. for example: don't dispatch JobA(1) again, but u can dispatch JobA(2) and JobA(3) without a problem.
the only thing i found was the withoutOverlapping() function but it's not helpful in my scenario. is there anyway i can check that a job for that account isn't running before i dispatch it again?
BTW, I am using database as the queue driver.

Check Laravel Jobs inside Queue::before to delete before they are processed

I was given the idea to look in the AppServiceProvider with Queue::before as a way to add a check for Jobs I no longer want to run and delete them without having to add checks to every Job I write.
Background, I am working on a SaaS that does audits so an audit can run for hours and be 1000s of jobs. If I can look for an audit id inside the jobs as they come through and compare with a Cache array of any audit ids that have been cancelled, I can save time.
So what I have got to is how do I unwrap the Job in the Queue::before to get an id to check? (Normal laravel Queues code, and using RabbitMQ)
As the jobs are wrapped in a layer or two of Event classes, and I can not dump the data to screen to see, just to log files, as it is in the queue.
in app/Providers/AppServiceProvider.php:
Queue::before(function (JobProcessing $event) {
// $event->connectionName
// $event->job
$job = $event->job->payload();
$obj = unserialize($job['data']['data']);
}
As far as it looks like for the events I am interesting the payload has data, which has data, that is the serialised object I am interested in. This does not seem the best way, or to see how to interact with it in a better way.
thanks
I am in the middle of a similar problem involving webhook delivery. Through a developer portal, we are allowing users to re-queue a webhook (to short-cut the wait on backed-off delivery attempts). Since this could create a second job for the same webhook, we sought a way to identify the original as out of date.
app/Jobs/DeliverWebhook.php constructor:
public function __construct(Webhook $webhook)
{
$this->webhook = $webhook;
$this->queued_at = Carbon::now();
Cache::put(
'DeliverWebhook.'. $this->webhook->id .'.QueuedAt',
$this->queued_at,
Carbon::now()->addDays(3)
);
}
Here, you can see we've attached a queued_at attribute to this instance of the job. (We can probably also make this more unique with use of something like uniqid() or random_bytes() to avoid potential double-click issues or similar hiccups when queuing.)
The second part is that we set the semi-unique cache key to match this queued_at time. I set it to expire in 3 days, past the end of our backed-off retry attempts.
Now, when a job is picked up for processing, I can check the job instance's queued_at attribute against the cached value, and delete the job if it is old.
In my AppServiceProvider boot method:
Queue::before(function ($event) {
if ($event->job->queue == 'webhooks' && $event->job->getName() == 'DeliverWebhook') {
$cache_key = 'DeliverWebhook.'. $event->job->instance->webhook->id .'QueuedAt';
if ($event->job->instance->queued_at < Cache::get($cache_key)) {
$event->job->delete();
throw new JobRequeuedException;
}
}
});
An exception is thrown at the end because the queue worker, by default, does not check if the job is deleted before calling $job->fire(). Throwing the exception forces the worker to skip fire() and jump into the handleJobException() method.
NOTE: I still need to test this appropriately.

How to retrieve newly created entity by asynchronous command (CQRS) in same php process?

I am implementing PHP application with CQRS.
Let's say I have CreateOrderCommand and when I do
$command = new CreateOrderCommand(/** some data**/);
$this->commandBus->handle($command);
CommandBus now just pass command to proper CreateOrderCommandHandler class as easily as:
abstract class SimpleCommandBus implements CommandBus
{
/** #var ICommandHandlerLocator */
protected $locator;
/**
* Executes command
*
* #param Command $command
*/
public function handle(Command $command)
{
$handler = $this->locator->getCommandHandler($command);
$handler->handle($command);
}
}
Everything ok.
But handling is void method, so I do not know anything about progress or result. What can I do to be able to for example fire CreateOrderCommand and then in same process acquire newly created entity id (probably with some passive waiting for its creation)?
public function createNewOrder(/** some data**/){
$command = new CreateOrderCommand(/** some data**/);
$this->commandBus->handle($command);
// something that will wait until command is done
$createdOrder = // some magic that retrieves some adress to result data
return $createdOrder;
}
And to get closer of what CQRS can provide, command bus should be able to have RabbitMqCommandBus implementation that just serializes command and sends it to rabbit queue.
So, then the process that finally handles command might be some running consumer and some kind of communication between processes is needed here - to be able to somehow inform original user process from consumer, that it is done (with some information, for example id of new entity).
I know that there is solution with GUID - I could mark command with GUID. But then what:
public function createNewOrder(/** some data**/){
$command = new CreateOrderCommand(/** some data**/);
$this->commandBus->handle($command);
$guid = $command->getGuid();
// SOME IMPLEMENTATION
return $createdOrder;
}
SOME IMPLEMENTATION should do some checking of events (so I need to implement some event system too) on command with specific GUID, to be able to for example echo progress or on OrderCreatedEvent just return it's ID that I would get from that event. Consumer process that asynchronously handles command might for example feed events to rabbit and user client would taking them and do proper response (echo progress, return newly created entity for example).
But how to do that? And is solution with GUID the only one? What are acceptable implementations of solutions? Or, what point am I missing? :)
The easiest solution to get information about id of created aggregate/entity is to add it to the command. So the frontend generates the id and pass it with the data. But to make this solution works, you need to make use of uuid instead of normal database integers, otherwise you may find yourself duplicating identifiers on the db side.
If the command is async and perform so time consuming actions, you can for sure publish events from the consumer. So the client via.e.g. websockets receives the informations in real time.
Or ask the backend about existance of the order with the id from the command, from time to time and when the resource exists, redirect him to the right page.

PHP/MySQL Job queueing system doing jobs more than once

I came up with a very simple job queueing system using PHP, MySQL and cron.
Cron will call a website, which has a function that calls function A() every 2 seconds. A() searches and retrieves a row from table A
Upon retrieving a row, A() will update that row with value 1 in column working
A() then does something to the data in the retrieved row
A() then insert a row in table B with the value obtained during processing step 3.
Problem: I notice that there are sometimes duplicate values in the table B due to function A() retrieving the same row from table A multiple times.
Which part of the design above is allowing the duplicate processing, and how should it be fixed?
Please don't suggest something like rabbitMQ without at least showing how it can be implemented in more details. I read some of their docs and did not understand how to implement it. Thanks!
Update: I have a cron job that calls a page (which calls function c()) every minute. This function c() that does a loop 30 times which calls function A(), using sleep() to delay.
The supplied answer is good, file locks work well, but, since you're using MySQL, I thought I'd answer as well. With MySQL you can implement cooperative asynchronous locking using GET_LOCK and RELEASE_LOCK.
*DISCLAIMER: The examples below are untested. I have successfully implemented something very close to this before, and the below was the general idea.
Let's say you've wrapped this GET_LOCK function in a PHP class called Mutex:
class Mutex {
private $_db = null;
private $_resource = '';
public function __construct($resource, Zend_Db_Adapter $db) {
$this->resource = $resource;
$this->_db = $db;
}
// gets a lock for $this->_resource; you could add a $timeout value,
// to pass as a 2nd parameter to GET_LOCK, but I'm leaving that
// out for now
public function getLock() {
return (bool)$this->_db->fetchOne(
'SELECT GET_LOCK(:resource)',
array(
':resource' => $this->_resource
));
}
public function releaseLock($resource) {
// using DO because I really don't care if this succeeds;
// when the PHP process terminates, the lock is released
// so there is no worry about deadlock
$this->_db->query(
'DO RELEASE_LOCK(:resource)',
array(
':resource' => $resource
));
}
}
Before A() fetches methods from the table, have it ask for a lock. You can use any string as the resource name.
class JobA {
public function __construct(Zend_Db_Adapter $db) {
$this->_db = $db;
}
public function A() {
// I'm assuming A() is a class method and that the class somehow
// acquired access to a MySQL database - pretend $this->db is a
// Zend_Db instance. The resource name can be an arbitrary
// string - I chose the class name in this case but it could be
// 'barglefarglenarg' or something.
$mutex = new Mutex($this->db, get_class($this));
// I choose to throw an exception but you could just as easily
// die silently and get out of the way for the next process,
// which often works better depending on the job
if (!$mutex->getLock())
throw new Exception('Unable to obtain lock.');
// Got a lock, now select the rows you need without fear of
// any other process running A() getting the same rows as this
// process - presumably you would update/flag the row so that the
// next A() process will not select the same row when it finally
// gets a lock. Once we have our data we release the lock
$mutex->releaseLock();
// Now we do whatever we need to do with the rows we selected
// while we had the lock
}
}
When you engineer a scenario in which multiple processes are selecting and modifying the same data, this kind of thing comes in very handy. When using MySQL, I prefer this database approach to the file locking mechanism, for portability - it's easier to host your app on different platforms if the locking mechanism is external to the filesystem. Sure it can be done, and it works fine, but in my personal experience I found this easier to use.
If you plan on your app being portable across database engines, then this approach will probably not work for you.
One problem could be the processing at first:
Cron will call a function A() that searches and retrieves a row from table A every 2 seconds.
The processing of this part of the script could take longer than two seconds on a table without indexes as such you could pick multiple rows.
You could remedy this with an exclusive file lock.
I have a feeling there is more than just the workflow, if you can show some basic code attached maybe there might be a problem in the code as well.
edit
I think it is timing judging by your last update:
Update: I have a cron job that calls a page (which calls function c())
every minute. This function c() that does a loop 30 times which calls
function A(), using sleep() to delay.
Thats a lot of jumping through hoops and I think you might have a threading problem where crons are overlapping.

Categories