Symfony/GuzzleHttp: Limiting API calls across multiple consumers/instances - php

I've been working on a project for a while that fetches data from an API and processes that data locally for various uses. Currently, a consumer picks up JSON objects from the message queue that it uses to trigger a matching Symfony command. The rate limiting is built in to this one consumer, is fairly simple and adjusts itself automatically to status responses from the API. The problem is, the way it is set up, it cannot run in parallel and if there is a major update to the versioned static data on the API, all processing halts while it caches the new static data.
I looked at using the rabbitmq-bundle Symfony bundle and converting the commands into separate consumers with their own channels so that they can be run in parallel and no longer block each other, however this comes with a couple of issues I'm stuck with how to handle.
The first is that I still need to manage limiting the API calls across all the consumers. I have a wrapper for Guzzle that could, in theory, use a simple file to manage to number of calls across all instances of it. I looked at an existing token bucket library but setting it up to work in Symfony looks problematic as each consumer could potentially reset the number of tokens if the consumer is restarted, so... Not sure where to go with that.
The second is that some consumers may hit data from the main API that we're still do not have the matching version of the static data for. If this happens, it needs to trigger the related consumers but only if there isn't already a trigger in each queue... Possible solution I can see for this is record the latest requested version in a file at the time a message is published to update it and have the consumer wait for the data to be available locally. Again, kind of lost about how best to handle this.

Related

How can I make asynchronous request with Laravel?

In my Laravel 5.4 web app user can request report generation that takes a couple of minutes due to a big amount of data. Because of these he couldn't work with application no more, until report will be generated. To fix this problem I have read about queues in laravel and separated out my report generation code to the job class, but my app still holds until report will be generated. How can I fix that?
To be absolutely clear I will sum up my problem:
User make request for report generation (my app absolutely holds at this moment)
My app receives POST request in routes and calls a function from the controller class.
Controller's function dispatches a job, that should generate report and put it into the client web folder.
It sounds like you have already pretty much solved the problem by introducing a queue. Put the job in the queue, but don't keep track of its progress - allow your code to continue and return to the user. It should be possible to "fire-and-forget", and then either ask the user to check if the report is ready in a couple of minutes, or offer the ability to email it to them when it is completed.
By default, Laravel uses the sync queue driver. This driver executes the queued jobs in the same request as the one they are created in. So this won't make any difference.
You should take a look at other drivers and use the Laravel queue worker background process to execute jobs to make sure they don't hold the webrequest from completing.

Multiple PHP requests waiting for one

I'm currently developing a website. I want to cache results from an API but the API is slow, so I must handle concurrent PHP requests for same result.
When a PHP is collecting API results for a certain ID, I want next PHP requests for same ID to wait the first to finish and just read the cached value.
My current solution is to add an empty value in cache (and if value is empty new PHP requests will just sleep and recheck) but it sometimes doesn't work.
Is there another or a better way ?
Consider using synchronization provided by PECL Sync library and SyncReaderWriter class, or just by simple File Locking System.
If You want working solution, I've created PHP Cache library that synchronize read/write process especially for task like You described above.
I think you may find it useful, check it here: https://github.com/tztztztz/php-no-slam-cache

Should I use Laravel Queues to manage threads across my application

I am looking to hit multiple 3rd party APIs to gather information for a user's search query. I am planning to spin off a thread for each API I want to hit to minimize the response time on my end. I also want to limit the amount of threads my application can have running at any one time due to memory/cpu concerns.
Since I am using Laravel as my framework, I was trying to accomplish this using Laravel queues, but it seems that I might have trouble getting the response data from the Job.
Are laravel queues the correct way to tackle this? If so how do I
listen for the job's status and retrieve the data once the job is complete? I see some things that point towards passing a closure to the job, but something just isn't clicking for me.
It depends. A job queue and worker pool might be appropriate if there are a really huge number of API calls to make, especially if those API calls can be very slow. But, I'd try to avoid all that architecture unless you're really sure you need it.
To start, I'd look at doing async requests to the external APIs, and try to keep the whole thing in a single process. The Guzzle HTTP client library provides a very programmer-friendly API for doing this kind of asynchronous requests.
If the external requests are really numerous or slow, you might consider using a queue. But in that case, you're looking at implementing a bunch of logic to queue all the jobs, then poll until they're done (giving feedback to your user along the way), and finally return the merged result. That may end up being necessary, but I'd start with the simpler implementation I describe above.

How can I use & set cookies whilst inside a Laravel queued Job, and why is my current solution failing?

I have a need for part of my application to make calls to Reddit asynchronously from my core application's workflow. I have implemented a semi-workable solution by using a Reddit API library I have built here. For those that are unaware, Reddit manages authentication via OAuth and returns a bearer and a token for a particular user that expires in 60 minutes after generation.
I have opted to use cookies to store this authorization information for the mentioned time period, as seen in the requestRedditToken() method here. If a cookie is not found (i.e. it has expired) when another request to Reddit needs to be made, another reddit token is generated. This seems like it would work just fine.
What I am having trouble with is conceptualizing how cookies are handled when integrated with a daemonized queue worker, furthermore, I need to understand why these calls are failing periodically.
The application I'm working with, as mentioned, makes calls to Reddit. These calls are created by a job class being handled: UpdateRedditLiveThreadJob, which you can see here.
These jobs are processed by a daemonized Artisan queue worker using Laravel Forge, you can see the details of the worker here. The queue driver in this case is Redis, and the workers are monitored by Supervisor.
Here is the intended workflow of my app:
An UpdateRedditLiveThreadJob is created and thrown into the queue to be handled.
The handle() method of the job is called.
A Reddit client is instantiated, and a reddit token is requested if a cookie doesn't exist.
My Reddit client successfully communicates with Reddit.
The Job is considered complete.
What is actually happening:
The job is created.
Handle is called.
Reddit client is instantiated, something odd happens here generally.
Reddit client tries to communicate, but gets a 401 response which produces an Exception. This is indicative of a failed authorization.
The task is considered 'failed' and loops back to step 2.
Here are my questions:
Why does this flow work for the first hour, and then collapse as described above, after presumably, the cookie has expired?
I've tried my best to understand how Laravel Queues work, but I fundamentally am having a hard time of conceptualizing the different types of queue management options available: queue:listen, queue:work, a daemonized queue:work running on Supervisor, etc. Is my current queue infrastructure compatible with using cookies to manage tokens?
What adjustments do I need to make to my codebase to make the app function as intended?
How will my workflow handle multiple users, who each potentially have multiple cookies?
Why does the workflow magically start working again if I restart my queue worker?
Please let me know if I'm incorrectly describing anything here or need clarification, I've tried my best to explain the problem succinctly.
Your logic is incorrect. A queue job is in fact a cli running php script. It has no interaction with a browser. Cookies are set in a browser, see this related thread for reference.
Seeing you're interacting with an API it would make more sense to set the token as a simple variable in the Job (or better yet in that wrapper) and then re-use that within that job.
TL:DR: your wrapper is not an API client.
I know this is not a complete answer to all your questions, but it's a push in the right direction. Because would I have answered all your questions - in the end - might not have given any solution to your issues ;)

Gearman: is there still no way to retrieve custom data from a background worker?

First things first, I'm aware of this question:
Gearman: Sending data from a background worker to the client
What I want to know, is it still the case with Gearman? I'm planning on sending a batch of image URLs from a PHP web application to the gearman worker (also written in PHP; let's call it "The Main Worker") for processing asynchronously. This worker will then submit a separate task for each image to lower-tier workers (via addTask()), call runTasks() and wait for the tasks to finish, while listening to exceptions, accumulating error messages and updating the overall job status.
While I'm perfectly ok with retrieving the overall status from the Main Worker using jobStatus() calls, then just say that all of the images were processed when [false, false, 0, 0] is returned, I definitely need to be able to inform the users that some of the images couldn't be retrieved from their respective URLs or stored on the server.
I suppose I could always just store the custom data in memcache, then retrieve it from the web app, but it just seems "dirtier" to me...
I'm not trying to get any result, because from what I've seen in the manual on php.net, even the exception handling can only be done when the task is submitted synchronously, not mentioning the custom data retrieval. I just hoped that there could be something I'm missing.
I'm I remember correctly, we're using Ubuntu Server 12.04 with libgearman6 (v 0.27) and PHP 5.3.10. The version of the gearman extension is 1.0.2. I think the database is irrelevant here, as I will not be using it in either of the workers. And I think we're not using persistent queues right now.
Since gearman won't keep any task information in memory after a task has finished (just report it back for a synchronous task), you won't be able to retrieve it in your web application without storing it in a 3rd party location. We usually use a simple web service in the application for this, letting the worker call back to the application when a task has completed or an error has occured. This allows us to keep the business logic about what we'd like to do when such an error happens in the application where it belongs, and let our workers be more general (we might need image resizing in many apps, but some apps might want to start several sub tasks that depend on the image resizing being done first).
As you write, you may also let the worker write directly to the database with the state of the task or to memcached, but I've found that letting the application itself handle the logic instead of having to change and special case the workers work better. It's also well suited for a worker framework letting you keep the same standardized way of handling callback across actual worker code.

Categories