I am looking to hit multiple 3rd party APIs to gather information for a user's search query. I am planning to spin off a thread for each API I want to hit to minimize the response time on my end. I also want to limit the amount of threads my application can have running at any one time due to memory/cpu concerns.
Since I am using Laravel as my framework, I was trying to accomplish this using Laravel queues, but it seems that I might have trouble getting the response data from the Job.
Are laravel queues the correct way to tackle this? If so how do I
listen for the job's status and retrieve the data once the job is complete? I see some things that point towards passing a closure to the job, but something just isn't clicking for me.
It depends. A job queue and worker pool might be appropriate if there are a really huge number of API calls to make, especially if those API calls can be very slow. But, I'd try to avoid all that architecture unless you're really sure you need it.
To start, I'd look at doing async requests to the external APIs, and try to keep the whole thing in a single process. The Guzzle HTTP client library provides a very programmer-friendly API for doing this kind of asynchronous requests.
If the external requests are really numerous or slow, you might consider using a queue. But in that case, you're looking at implementing a bunch of logic to queue all the jobs, then poll until they're done (giving feedback to your user along the way), and finally return the merged result. That may end up being necessary, but I'd start with the simpler implementation I describe above.
Related
So I have an app that does API call every minute or so to update local db on the app.
I don't know about scalability in this scenario, but let's say I have 100 users, so that makes it 100 api calls a minute or so.
How bad is this?
Is there a way to mock hundreds of api calls?
The scenario you explained above won't be bad at all, even when performed in a small server, assuming the call you are making is relatively short and does not use lots of memory in the server. This could be more efficient if you spin up more workers threads in your server.
Laravel's documentation explains a little more.
Alternatively, you could use a well-known Java framework called JMeter to test your API calls in any scenarios you may want.
For more visit their documentation.
I hope this helps!
I would like to know if someone could explain to me how to build a real-time application with Symfony?
I have looked at a lot of documentation with my best friend Google, but I have not found quite detailed articles.
I would like some more PHP-oriented thing and saw that there were technologies like ReactPHP / Ratchet (but I can not find a tutorial clear enough to integrate it into an existing symfony project).
Do you have any advice on which technologies to use and why? (If you have tutorial links I take!)
Thank you in advance for your answers !
Every useful Symfony application does some form of I/O. In traditional applications this is most often blocking I/O. Even if it's non-blocking I/O, it doesn't integrate a global event loop that could schedule other things while waiting for I/O.
If you integrate Symfony into an existing event loop based WebSocket server, it will work with blocking I/O as a proof of concept, but you will quickly notice it isn't running fine in production, because any blocking I/O blocks your whole event loop and thus blocks all other connected clients.
One solution is rewriting everything to non-blocking I/O, but then you'd no longer be using Symfony. You might be able to reuse some components, but only those not doing any I/O.
Another solution is to use RPC and queue WebSocket requests into a queue. The intermediary can be written using non-blocking I/O only, it doesn't have much to do. It basically just forwards WebSocket messages as RPC requests to a queue. Then you have a set of workers pulling from that queue, doing a normal Symfony kernel dispatch and sending the response into a response queue. The worker can then continue to fetch the next job.
With the second solution you can totally use blocking I/O and all existing Symfony components. You can spawn as many workers as you need and you can even keep them alive between requests. The difference with a queue in between is that one blocking worker doesn't block the responsiveness of the WebSocket endpoint.
If you want multiple WebSocket processes, you'll need separate response queues for them, so the responses are sent back to the right process where the client is connected.
You can find a working implementation with BeanstalkD as queue in kelunik/rpc-demo. src/Server.php is just for the demo purpose and can be replaced with a HTTP server at any time. To keep the demo simple it uses a single WebSocket process but that can be changed as outlined above. You can start php bin/server and php bin/worker, then use telnet localhost 2000 to connect and send messages. It will respond with the same message but base64 encoded in the workers.
The mentioned demo is built on Amp, but the same concepts apply to ReactPHP as well.
In this issue of the official Symfony repository you may find comments and ideas about this: https://github.com/symfony/symfony/issues/17051
this side of PHP is rather new to me.
I am interested in firing off a large number (25-50) separate processes from a parent script. I would like for the parent script to not wait for these other scripts to complete AND I would like for these other scripts to run in parallel.
Each script would run for a specified amount of time calling a webservice.
Can anyone give me some direction with this? I'm not asking for a coded answer specifically, but I just need some guidance.
Much thanks.
It really depends on what you want to achieve. #Julien's forking method could work, but this is not preferable if your web service calls are data intensive. I am not saying that forking is bad on the contrary it works, but with the ammount of different wev services you want to call you should have a way manage things better.
Another thing that you can do is base this on cronjobs. For example if you're calling these webservices for some users in your app create a queue - a DB table that you add records that need to be processed. If you are using Cake use the Cake Shells. Then set up cronjobs that call a the shells that processes these records every now and then. Divide all services is separate queues - at least for those that are very different in logic. This way you will also divide your risk because if there is a failure in one of the web service calls you would not jeopardise all in some way. Have separate logging abilities for each queue which will enable you to quickly track down problems. With consuming web services very often problems are external to your application.
I'm currently working on an event-logging system that will form part of a real-time analytics system. Individual events are sent via rpc from the main application to another server where a separate php script running under apache handles the event data.
Currently the receiving server PHP script hands off the event data to an AMQP exchange/queue from where a Java application pops events from the queue, batches them up and performs a batch db insert.
This will provide great scalability however I'm thinking the cost is complexity.
I'm now looking to simplify things a little so my questions are:
Would it be possible to remove the AMQP queue and perform the batching and inserting of events directly to the db from within the PHP script(s) on the receiving server?
And if so, would some kind of intermediary database be required to batch up the events or could the batching be done from within PHP ?
Thanks in advance
Edit:
Thanks for taking the time to respond, to be more specific. Is it possible for a PHP script running under Apache to be configured to handle multiple http requests?
So, as Apache spawns child processes each of these processes would be configured to accept say 1000 http requests, deal with them and then shut down?
I see three potential answers to your question:
Yes
No
Probably
If you share metrics of alternative implementations (because everything you ask about is techncially possible so please do it first and then get hard results) we can give better suggestions. But as long as you don't provide some meat, put it on the grill and show us the results, there is not much more to tell.
First things first, I'm aware of this question:
Gearman: Sending data from a background worker to the client
What I want to know, is it still the case with Gearman? I'm planning on sending a batch of image URLs from a PHP web application to the gearman worker (also written in PHP; let's call it "The Main Worker") for processing asynchronously. This worker will then submit a separate task for each image to lower-tier workers (via addTask()), call runTasks() and wait for the tasks to finish, while listening to exceptions, accumulating error messages and updating the overall job status.
While I'm perfectly ok with retrieving the overall status from the Main Worker using jobStatus() calls, then just say that all of the images were processed when [false, false, 0, 0] is returned, I definitely need to be able to inform the users that some of the images couldn't be retrieved from their respective URLs or stored on the server.
I suppose I could always just store the custom data in memcache, then retrieve it from the web app, but it just seems "dirtier" to me...
I'm not trying to get any result, because from what I've seen in the manual on php.net, even the exception handling can only be done when the task is submitted synchronously, not mentioning the custom data retrieval. I just hoped that there could be something I'm missing.
I'm I remember correctly, we're using Ubuntu Server 12.04 with libgearman6 (v 0.27) and PHP 5.3.10. The version of the gearman extension is 1.0.2. I think the database is irrelevant here, as I will not be using it in either of the workers. And I think we're not using persistent queues right now.
Since gearman won't keep any task information in memory after a task has finished (just report it back for a synchronous task), you won't be able to retrieve it in your web application without storing it in a 3rd party location. We usually use a simple web service in the application for this, letting the worker call back to the application when a task has completed or an error has occured. This allows us to keep the business logic about what we'd like to do when such an error happens in the application where it belongs, and let our workers be more general (we might need image resizing in many apps, but some apps might want to start several sub tasks that depend on the image resizing being done first).
As you write, you may also let the worker write directly to the database with the state of the task or to memcached, but I've found that letting the application itself handle the logic instead of having to change and special case the workers work better. It's also well suited for a worker framework letting you keep the same standardized way of handling callback across actual worker code.