So I have an app that does API call every minute or so to update local db on the app.
I don't know about scalability in this scenario, but let's say I have 100 users, so that makes it 100 api calls a minute or so.
How bad is this?
Is there a way to mock hundreds of api calls?
The scenario you explained above won't be bad at all, even when performed in a small server, assuming the call you are making is relatively short and does not use lots of memory in the server. This could be more efficient if you spin up more workers threads in your server.
Laravel's documentation explains a little more.
Alternatively, you could use a well-known Java framework called JMeter to test your API calls in any scenarios you may want.
For more visit their documentation.
I hope this helps!
Related
I am looking to hit multiple 3rd party APIs to gather information for a user's search query. I am planning to spin off a thread for each API I want to hit to minimize the response time on my end. I also want to limit the amount of threads my application can have running at any one time due to memory/cpu concerns.
Since I am using Laravel as my framework, I was trying to accomplish this using Laravel queues, but it seems that I might have trouble getting the response data from the Job.
Are laravel queues the correct way to tackle this? If so how do I
listen for the job's status and retrieve the data once the job is complete? I see some things that point towards passing a closure to the job, but something just isn't clicking for me.
It depends. A job queue and worker pool might be appropriate if there are a really huge number of API calls to make, especially if those API calls can be very slow. But, I'd try to avoid all that architecture unless you're really sure you need it.
To start, I'd look at doing async requests to the external APIs, and try to keep the whole thing in a single process. The Guzzle HTTP client library provides a very programmer-friendly API for doing this kind of asynchronous requests.
If the external requests are really numerous or slow, you might consider using a queue. But in that case, you're looking at implementing a bunch of logic to queue all the jobs, then poll until they're done (giving feedback to your user along the way), and finally return the merged result. That may end up being necessary, but I'd start with the simpler implementation I describe above.
this side of PHP is rather new to me.
I am interested in firing off a large number (25-50) separate processes from a parent script. I would like for the parent script to not wait for these other scripts to complete AND I would like for these other scripts to run in parallel.
Each script would run for a specified amount of time calling a webservice.
Can anyone give me some direction with this? I'm not asking for a coded answer specifically, but I just need some guidance.
Much thanks.
It really depends on what you want to achieve. #Julien's forking method could work, but this is not preferable if your web service calls are data intensive. I am not saying that forking is bad on the contrary it works, but with the ammount of different wev services you want to call you should have a way manage things better.
Another thing that you can do is base this on cronjobs. For example if you're calling these webservices for some users in your app create a queue - a DB table that you add records that need to be processed. If you are using Cake use the Cake Shells. Then set up cronjobs that call a the shells that processes these records every now and then. Divide all services is separate queues - at least for those that are very different in logic. This way you will also divide your risk because if there is a failure in one of the web service calls you would not jeopardise all in some way. Have separate logging abilities for each queue which will enable you to quickly track down problems. With consuming web services very often problems are external to your application.
I'm trying to make an app that goes and does actions for all the friends that the user of the app has. The problem is I didn't find yet a platform which I can develop such app as that on.
At first I tried using PHP, I used heroku and my code worked but because I had many friends the loop went more than 30 seconds and the request timed out and the operation stopped in the middle of the action.
I don't mind using any platform I just want it to work!
Python, C++, PHP. They all are fine for me.
Thanks in advance.
Let's start with that you can change the timeout settings, depending on where the restriction is set, can be on the php as explained set_time_limit function documentation:
Set the number of seconds a script is allowed to run. If this is
reached, the script returns a fatal error. The default limit is 30
seconds or, if it exists, the max_execution_time value defined in the
php.ini.
but it can also be set on the server itself.
Another issue is that routers on the route also have their own timeout limit, so from my experience ~60 seconds is the max.
As for what you want to do, the problem is not which language/technology you use, but the fact that you're making a lot of http requests to facebook which take a bit of time, and I believe that this is your bottleneck, and if that's the case then there's not much you can improve by choosing something other than php (though you can go with NIO which should improve the IO performance).
With that said, php is not always the best solution, depends on the task at hand.
Java or any other compiled language should perform better than a scripted language (php, python), and if you go with C++ you will top 'em all, but will you feel comfortable to program your app in C++?
Choose the language/technology you feel most "at home" with, if you have a selection to choose from then figure out what you need from your app and then research on which will perform better for what you need.
Edit
Last time I checked the maximum number of friends was limited to 5000.
If you need to to run a graph request per user friend then there's simply no way that you can do that without keeping the user waiting for way too long, regardless of timeouts.
You have two options as I see it:
Make the client asynchronous, you can use web sockets, comet, or even issue an ajax request every x seconds to get the computed data.
That way you don't need to worry about timeouts and the user can start getting content quickly.
Use the javascript api to make the graph requests, that way you completely avoid timing out, plus you reduce a huge amount of networking from your servers.
This option might not be available for you if you need your servers for the computation, if for example you depend on data from your db.
As for the "no facebook SDK for C++" issue, though I don't think it's even relevant, it's not a problem.
All facebook SDKs are simply wrappers for https request, so implementing your own SDK is not that hard, though I hate thinking about doing it with C++, but then again I hate thinking about doing anything with C++.
I am considering building a site using php, but there are several aspects of it that would perform far, far better if made in node.js. At the same time, large portions of of the site need to remain in PHP. This is because a lot of functionality is already developed in PHP, and redeveloping, testing, and so forth would be too large of an undertaking, and quite frankly, those parts of the site run perfectly fine in PHP.
I am considering rebuilding the sections in node.js that would benefit from running most in node.js, then having PHP pass the request to node.js using Gearman. This way, I scan scale out by launching more workers and have gearman handle the load distribution.
Our site gets a lot of traffic, and I am concerned if gearman can handle this load. I wan't to keep this question productive, so let's focus largely on the following addressable points:
Can gearman handle all of our expected load assuming we have the memory (potentially around 3000+ queued jobs at at time, with several thousand being processed per second)?
Would this run better if I just passed the requests to node.js using CURL, and if so, does node.js provide any way to distribute the load over multiple instances of a given script?
Can gearman be configured in a way that there is no single point of failure?
What are some issues that you guys can see arising both in terms of development and scaling?
I am addressing these wide range of points so anyone viewing this post can collect a wide range of information in one place regarding matters that strongly affect each other.
Of course I will test all of this, but I want to collect as much information as possible before potentially undertaking something like this.
Edit: A large reason I am using gearman is not because of it's non-blocking structure, but because of it's sheer speed.
I can only speak to your questions on Gearman:
Can gearman handle all of our expected load assuming we have the memory (potentially around 3000+ queued jobs at at time, with several thousand being processed per second)?
Short: Yes
Long: Everything has its limit. If your job payloads are inordinately large you may run into issues. Gearman stores its queue in memory.. so if your payloads exceed the amount of memory available to Gearman you'll run into problems.
Can gearman be configured in a way that there is no single point of failure?
Gearman has a plugin/extension/component available to use MySQL as a persistence store. That way, if Gearman or the machine itself goes down you can bring it right back up where it left off. Multiple worker-servers can help keep things going if other workers go down.
Node has a cluster module that can do basic load balancing against n processes. You might find it useful.
A common architecture here in nodejs-land is to have your nodes talk http and then use some way of load balancing such as an http proxy or a service registry. I'm sure it's more or less the same elsewhere. I don't know enough about gearman to say if it'll be "good enough," but if this is the general idea then I'd imagine it would be fine. At the least, other people would be interested in hearing how it went I'm sure!
Edit: Remember, number-crunching will block node's event loop! This is somewhat obvious if you think about it, but definitely something to keep in mind.
No, I'm not trying to see how many buzzwords I can throw into a single question title.
I'm making REST requests through cURL in my PHP app to some webservices. These requests need to be made fairly often since much of the application depends on this API. However, there is severe latency with the requests (2-5 seconds) which just makes my app look painfully slow.
While I'm halfway to a solution with a recommendation to cache these requests in Memcached, I'm still not satisfied with that kind of latency ever appearing within the application.
So here was my thought: I can implement AJAX long-polling in the background so that the user never experiences the latency outright. The REST requests/Memcache lookups will be done all through AJAX at a set interval.
But this is all really new to me and I'm not sure if this is the best approach. And if I'm on the right track, I do know that PHP + Apache is not going to handle something like this well. But PHP is the only language I know. I'd ideally like to set up something like Tornado in Python, but I'm just not sure if I'm over-engineering right now or not.
Any thoughts here would be helpful and much appreciated.
This was some pretty quick turnaround, but I went back through and profiled my app by echoing out microtime() throughout the relevant processes. Turns out that I'm not parallelizing my cURL requests and that's where I take the real hit. It takes approximately 2 seconds to do that, which means very long delays while each cURL request is done in succession.