No, I'm not trying to see how many buzzwords I can throw into a single question title.
I'm making REST requests through cURL in my PHP app to some webservices. These requests need to be made fairly often since much of the application depends on this API. However, there is severe latency with the requests (2-5 seconds) which just makes my app look painfully slow.
While I'm halfway to a solution with a recommendation to cache these requests in Memcached, I'm still not satisfied with that kind of latency ever appearing within the application.
So here was my thought: I can implement AJAX long-polling in the background so that the user never experiences the latency outright. The REST requests/Memcache lookups will be done all through AJAX at a set interval.
But this is all really new to me and I'm not sure if this is the best approach. And if I'm on the right track, I do know that PHP + Apache is not going to handle something like this well. But PHP is the only language I know. I'd ideally like to set up something like Tornado in Python, but I'm just not sure if I'm over-engineering right now or not.
Any thoughts here would be helpful and much appreciated.
This was some pretty quick turnaround, but I went back through and profiled my app by echoing out microtime() throughout the relevant processes. Turns out that I'm not parallelizing my cURL requests and that's where I take the real hit. It takes approximately 2 seconds to do that, which means very long delays while each cURL request is done in succession.
Related
I've been reading a lot on the subject of SSE and PHP, most of which seems to be advocating it as viable solutions for all sorts of things including chat apps. I have seen similar questions on this site but have not found a concise, definitive answer.
Is there something inherent in SSE which makes it way more server-friendly than AJAX short polling? Because the headers appear to be of very similar size. I am wondering if there is some kind of behind-the-scenes stuff beyond the headers that a noob like myself can't see e.g. some sort of connection recognition with each request/response? I know there are other factors involved where SSE prevails such as handling disconnections.
In terms of using it in a chat app scenario, ajax and sse appear to be doing the same thing. Neither of them seems to be able to perform long polling effectively with PHP. If I have User A and User B waiting on a PHP script that checks for new messages from the other user in the DB then sleeps for 3 seconds for say 10 loops, User A's new message cannot be inserted until User B has looped through the entire checking script, thereby rendering it absolutely useless (at least based on everything I've tried in the last 2 weeks!). I can get it working smoothly if I chat to myself and no one else is waiting on the checking script, but I've run out of things to talk about with myself and would really enjoy someone else being able to use it too.
So in a nutshell, given an Apache and PHP environment with WebSockets as not an option (due to shared hosting), is the only effective way to write a chat app, based on server burden alone, by short polling with one's choice of either AJAX or SSE, or is SSE definitely the superior option?
I would pursue WebSockets if the eventual traffic called for it and justified the web hosting upgrade.
(ALSO, as a side, is my premise off base regarding the long-polling scenario I described above where User A must wait for User B's loop to finish before he/she/it can perform the insert? Got me confused as to why that should be the case).
Kind of a long-winded, meandering question but hoping someone in the same situation can find this question and save themselves a lot of time.
Many Thanks!
Yes, SSE is a better option than AJAX, as AJAX polling is done on the main servers, like where most of the normal user traffic is to be hit. Whereas SSE polling is done on another instance which is made for it, so there will be no extra traffic on the main server. Please check Mercure (https://mercure.rocks/)
EDIT:
I mean to that, using SSE with platforms like Mercure would be a better option than AJAX. As AJAX will make a request to the main server. Which would increase the count of requests for the main server. Whereas we can distribute the network load using tools like the Mercure, in order to achieve the required functionality.
SSE can be thought of a thin API wrapper around the AJAX long-poll approach. It brings a standard API to something that was a hacky solution before.
something inherent in SSE which makes it way more server-friendly than AJAX short polling?
It holds the socket open. The pro of this is less latency (as soon as the server has the new information it sends it to the client, rather than waiting for the next client poll); the con is the extra resource usage (the socket, and the PHP process).
but I've run out of things to talk about with myself
Surely not. Have you tried starting a chat about if time is an illusion, and what came before?
with WebSockets as not an option (due to shared hosting)
SSE and WebSockets both hold a socket open. Shared hosting ISPs often go round closing sockets that have been open a long time (e.g. over 60s), unless they explicitly say they support SSE. The may also kill long-running PHP processes.
is my premise off base regarding the long-polling scenario I described above where User A must wait for User B's loop to finish before he/she/it can perform the insert?
I think it is off. The "A" in Ajax is asynchronous, meaning you can have multiple ajax/sse requests running at the same time. And on the server side you will have a distinct PHP process running for each request.
Ok, I am working up something like a chat environment, and I'd like to have near real time if not real time conversation. But I know browsers will only give up 2 threads at a time for transactions per domain. So I am trying to figure out a way to make a synchronous chat without really effecting the browser. I also know browsers tend to lock up with synchronous requests.
So whats the best approach at creating a chat like environment on a site from scratch, assume the DB and scripting concept is fine, its the managing of the connection, wondering how to keep a persistant connection that won't congest the browser and cause it to possibly freeze up.
Anyone have any ideas.. Im not looking for flash, or java based solutions. I'd prefer not to poll every second either. But what is stacks impression, what would you do.
First off, the spec only suggests that two connections are allowed. Most modern browsers actually support up to 6.
There're three main accepted methods for creating a chat system out of pure Javascript:
Polling
The first solution is simple, and just involves polling the server every few seconds (5 is a nice number) to see what it's missed. It works simply and efficiently, but can lead to large amounts of unnecessary requests if not careful, which can cause unnecessary server load.
A better implementation of this involves polling to simply check if anything's happened since the last chat update, and if so, only then go through the process of finding out what's happened. Saves on the server load and bandwidth fronts.
Waiting
This method's more commonly used, and involves the browser sending a request to the server which is never fulfilled, and instead keeps 'waiting for a response'. When something happens, the server outputs it and fulfills the request, and the client makes another request and the process repeats. This saves on the request front, but can end up with a backlog of ongoing processes on your server.
Websockets
https://developer.mozilla.org/en/WebSockets
This involves creating a direct socket connection to the server, allowing data to be pushed to the client when needed. It's relatively new though, and can have some compatability issues, especially with older browsers.
Out of these, none of them is specifically the 'best method'; it depends on what you're aiming for, and what matters. If you've got a site designed for up-to-date browsers, then websockets could be your answer, but if you've got a small-ish server, then polling could be better, for example.
My own chat engine checks for new messages every five seconds. That's close enough to instant that nobody knows the difference.
It's as simple as setInterval(updateChat,5000);.
I'm trying to make an app that goes and does actions for all the friends that the user of the app has. The problem is I didn't find yet a platform which I can develop such app as that on.
At first I tried using PHP, I used heroku and my code worked but because I had many friends the loop went more than 30 seconds and the request timed out and the operation stopped in the middle of the action.
I don't mind using any platform I just want it to work!
Python, C++, PHP. They all are fine for me.
Thanks in advance.
Let's start with that you can change the timeout settings, depending on where the restriction is set, can be on the php as explained set_time_limit function documentation:
Set the number of seconds a script is allowed to run. If this is
reached, the script returns a fatal error. The default limit is 30
seconds or, if it exists, the max_execution_time value defined in the
php.ini.
but it can also be set on the server itself.
Another issue is that routers on the route also have their own timeout limit, so from my experience ~60 seconds is the max.
As for what you want to do, the problem is not which language/technology you use, but the fact that you're making a lot of http requests to facebook which take a bit of time, and I believe that this is your bottleneck, and if that's the case then there's not much you can improve by choosing something other than php (though you can go with NIO which should improve the IO performance).
With that said, php is not always the best solution, depends on the task at hand.
Java or any other compiled language should perform better than a scripted language (php, python), and if you go with C++ you will top 'em all, but will you feel comfortable to program your app in C++?
Choose the language/technology you feel most "at home" with, if you have a selection to choose from then figure out what you need from your app and then research on which will perform better for what you need.
Edit
Last time I checked the maximum number of friends was limited to 5000.
If you need to to run a graph request per user friend then there's simply no way that you can do that without keeping the user waiting for way too long, regardless of timeouts.
You have two options as I see it:
Make the client asynchronous, you can use web sockets, comet, or even issue an ajax request every x seconds to get the computed data.
That way you don't need to worry about timeouts and the user can start getting content quickly.
Use the javascript api to make the graph requests, that way you completely avoid timing out, plus you reduce a huge amount of networking from your servers.
This option might not be available for you if you need your servers for the computation, if for example you depend on data from your db.
As for the "no facebook SDK for C++" issue, though I don't think it's even relevant, it's not a problem.
All facebook SDKs are simply wrappers for https request, so implementing your own SDK is not that hard, though I hate thinking about doing it with C++, but then again I hate thinking about doing anything with C++.
I am considering building a site using php, but there are several aspects of it that would perform far, far better if made in node.js. At the same time, large portions of of the site need to remain in PHP. This is because a lot of functionality is already developed in PHP, and redeveloping, testing, and so forth would be too large of an undertaking, and quite frankly, those parts of the site run perfectly fine in PHP.
I am considering rebuilding the sections in node.js that would benefit from running most in node.js, then having PHP pass the request to node.js using Gearman. This way, I scan scale out by launching more workers and have gearman handle the load distribution.
Our site gets a lot of traffic, and I am concerned if gearman can handle this load. I wan't to keep this question productive, so let's focus largely on the following addressable points:
Can gearman handle all of our expected load assuming we have the memory (potentially around 3000+ queued jobs at at time, with several thousand being processed per second)?
Would this run better if I just passed the requests to node.js using CURL, and if so, does node.js provide any way to distribute the load over multiple instances of a given script?
Can gearman be configured in a way that there is no single point of failure?
What are some issues that you guys can see arising both in terms of development and scaling?
I am addressing these wide range of points so anyone viewing this post can collect a wide range of information in one place regarding matters that strongly affect each other.
Of course I will test all of this, but I want to collect as much information as possible before potentially undertaking something like this.
Edit: A large reason I am using gearman is not because of it's non-blocking structure, but because of it's sheer speed.
I can only speak to your questions on Gearman:
Can gearman handle all of our expected load assuming we have the memory (potentially around 3000+ queued jobs at at time, with several thousand being processed per second)?
Short: Yes
Long: Everything has its limit. If your job payloads are inordinately large you may run into issues. Gearman stores its queue in memory.. so if your payloads exceed the amount of memory available to Gearman you'll run into problems.
Can gearman be configured in a way that there is no single point of failure?
Gearman has a plugin/extension/component available to use MySQL as a persistence store. That way, if Gearman or the machine itself goes down you can bring it right back up where it left off. Multiple worker-servers can help keep things going if other workers go down.
Node has a cluster module that can do basic load balancing against n processes. You might find it useful.
A common architecture here in nodejs-land is to have your nodes talk http and then use some way of load balancing such as an http proxy or a service registry. I'm sure it's more or less the same elsewhere. I don't know enough about gearman to say if it'll be "good enough," but if this is the general idea then I'd imagine it would be fine. At the least, other people would be interested in hearing how it went I'm sure!
Edit: Remember, number-crunching will block node's event loop! This is somewhat obvious if you think about it, but definitely something to keep in mind.
I'm keeping my self busy working on app that gets a feed from twitter search API, then need to extract all the URLs from each status in the feed, and finally since lots of the URLs are shortened I'm checking the response header of each URL to get the real URL it leads to.
for a feed of 100 entries this process can be more then a minute long!! (still working local on my pc)
i'm initiating Curl resource one time per feed and keep it open until I'm finished all the URL expansions though this helped a bit i'm still warry that i'l be in trouble when going live
any ideas how to speed things up?
The issue is, as Asaph points out, that you're doing this in a single-threaded process, so all of the network latency is being serialized.
Does this all have to happen inside an http request, or can you queue URLs somewhere, and have some background process chew through them?
If you can do the latter, that's the way to go.
If you must do the former, you can do the same sort of thing.
Either way, you want to look at way to chew through the requests in parallel. You could write a command-line PHP script that forks to accomplish this, though you might be better off looking into writing such a beast in language that supports threading, such as ruby or python.
You may be able to get significantly increased performance by making your application multithreaded. Multi-threading is not supported directly by PHP per se, but you may be able to launch several PHP processes, each working on a concurrent processing job.