Workaround instead of using websockets / ajax-long-pulling

Workaround instead of using websockets / ajax-long-pulling - php

My site checks the database for overdue reminders by making an AJAX request every 5 seconds. I've been told this is not ideal because of too many AJAX calls, one person said it's almost like DDOSing your own site if there are many people on with several tabs open etc.
The alternatives are using websockets or ajax-long-pulling. I can't use websockets because my shared hosting doesn't have that capability. Ajax-long-pulling is not ideal because of limited connections.
So a workaround I thought of would be to have a file on the server which simply stores a token. The browser reads this file via a hidden i-frame every 5 seconds, and if something relevant changes in the database the token is changed and this signals the browser to send an AJAX request to get the new reminders.
Would that be a feasible workaround to significantly reduce the load on the server since it's not making so many AJAX calls every 5 seconds since it's just reading the file instead?

"one person said it's almost like ..." - was that person the system administrator of the host where your site runs?
"because my shared hosting doesn't have that capability" - if writing lots of code is cheaper than switching web providers then you really need to look at your at the way you manage your hosting.
"Ajax-long-pulling is not ideal because of limited connections" - and you think the same problem would not apply to websockets?
You seem to be trying to fix a problem you don't know exists. And you don't seem to be in a position to evaluate whether any changes you make will have the desired effect.
After discounting comet and websockets, your only option for reducing the impact on the server is to reduce the frequency of polling. But your next step should be to get establish a capability to assess what impact the facility currently has on the server and how that changes if you change the behaviour. This can be rather tricky on a shared host - at best you're going to see a lot of noise generated by other tenants, but if you really were consuming a lot of resources, I'm quite confident that a competent hosting provider would be alerting you to this.

Related

XHR long polling for chat application (instead sockets)?

I need to create a user-to-user real time chat system. I managed to create a simple AJAX/PHP/MySQL script for the chat, but one thing worries me in the PHP/MySQL part:
while(true) {
// ...SELECT * FROM messages ORDER BY date DESC LIMIT 1...
if($row['date'] > $_POST['last']) {
echo json_encode($row);
break;
}
sleep(1);
}
Doesn't that mean that it will SELECT the table every 1 second and wouldn't that overload the server?
I tried to use PHP sockets, but it was a nightmare. I had to buy SSL certificate, also, the server was crashing on tests with many users, so I decided to use the long pulling system for now. If I remember correctly, Facebook was using XHR long polling before switching to sockets (if they switched at all).

Well, your question is too generalized. While it mostly depends on the load your will receive and your server specifics, it's generally discouraged to use long polling.
There are certain assumptions made in your question, that are not totally true at all.
I had to buy SSL certificate, also
Did you? As far as I know there are free certificate issuers, such as letsencrypt. The problem here is maybe you are using shared hosting with FTP only access. Consider getting a VPS from some cloud provider, e.g. DigitalOcean. You will have full-access to the virtual environment there. And most of the cheapest VPSs are on the same price you can get a shared hosting.
Facebook was using XHR long polling before switching to sockets
Yes, they were and some times they fall back to them too. But, first of all - they have a lot of computing power. They can afford these things. And second, the facebook web chat is not the fastest app I have ever used :)
-
With indexed columns and a few records on a normal MySQL single instance, you will not notice any slow downs. When the data grows and so do the users (simultaneous connections), you will gradually find out that you need to optimize things and one day you will eventually go to the WebSocket's favor.
PHP at heart is not meant to be asynchronous. All the async things along with the whole event loop you will need to either do by yourself or compose several frameworks to come to the rescue.
I would generally suggest a complete proprietary WebSocket implementation with asynchronous runtime. You either take a look at Ratchet for PHP or use ws for Node.js.

is it too much to check database every half a second

So I'm making a chat app. I use ajax to check for new entry(message) in database every 0.5sec. If there is then display the message. Is that too much to ask the server? Im using a cheap shared hosting service. From my experience so far, half the time message is fast and smooth, the other time, especially during peak time, messages disappear and the ajax request fail half the time. Sometimes even connection to database itself comes back fail. I want to know if its me asking too much from the server or my server is bad I should consider changing. (or both)

You have 2 simple ways,
First is that using some cache mechanism like Redis cache to save last state of every user-chat and last received message id,date to get rid of checking db on every request.
Second way is using Web-Socket instead of client to server requests. In this method you establishing a connection between client and server then once a new message arrived you pushes the message to client directly.
If you want to resolve your problem without any external tool, My offer to you is the second way.

In this case I would probably recommend going with Web Sockets, seems like it fits your use case.
Look into http://socketo.me/
That being said I would look into upgrading your hosting from a shared account if you are wanting your site to remain performant under a larger load. Again really depends how many users you are trying to cater for.

Adding websockets to existing application

So I wrote this nice SAAS solution and got some real clients. Now a request was made by a client to add some functionality that requires websockets.
In order to keep things tidy, I would like to use another server to manage the websockets.
My current stack is an AWS application loadbalancer, behind it two servers - one is the current application server. It's an Apache web server with PHP on it running the application.
The client side is driven by AngularJS.
The second server (which does not exist yet) is going to Nginx, and session will be stored on a third Memcache server.
Websocket and application server are both on the same domain, using different port in order to send the requests to the right server (AWS ELB allows sending requests to different server groups by port). Both the application and websockets will be driven by PHP and Ratchet.
My questions are two:
is for the more experienced developers - does such architecture sounds reasonable (I'm not aiming for 100Ks of concurrent yet - I need a viable and affordable solution aiming to max 5000 concurrents at this stage).
What would be the best way to send requests from the application server (who has the logic to generate the requests) to the websocket server?
Please notice I'm new to websockets so maybe there are much better ways to do this - I'll be grateful for any idea.

I'm in the middle of using Ratchet with a SPA to power a web app. I'm using Traefik as a front-end proxy, but yes, Nginx is popular here, and I'm sure that would be fine. I like Traefik for its ability to seamlessly redirect traffic based on config file changes, API triggers, and configuration changes from the likes of Kubernetes.
I agree with Michael in the comments - using ports for web sockets other than 80 or 443 may cause your users to experience connection problems. Home connections are generally fine on non-standard ports, but public wifi, business firewalls and mobile data can all present problems, and it's probably best not to risk it.
Having done a bit of reading around, your 5,000 concurrent connections is probably something that is going to need OS-level tweaks. I seem to recall 1,024 connections can be done comfortably, but several times that level would need testing (e.g. see here, though note the comment stream goes back a couple of years). Perhaps you could set up a test harness that fires web socket requests at your app, e.g. using two Docker containers? That will give you a way to understand what limits you will run into at this sort of scale.
Your maximum number of concurrent users strikes me that you're working at an absolutely enormous scale, given that any given userbase will usually not all be live at the same time. One thing you can do (and I plan to do the same in my case) is to add a reconnection strategy in your frontend that does a small number of retries, and then pops up a manual reconnect box (Trello does something like this). Given that some connections will be flaky, it is probably a good idea to give some of them a chance to die off, so you're left with fewer connections to manage. You can add an idle timer in the frontend also, to avoid pointlessly keeping unused connections open.
If you want to do some functional testing, consider PhantomJS - I use PHPUnit Spiderling here, and web sockets seems to work fine (I have only tried one at a time so far, mind you).
There are several PHP libraries to send web socket requests. I use Websocket Client for PHP, and it has worked flawlessly for me so far.

Load Balancing - How to set it up correctly?

Here it gets a little complicated. I'm in the last few months to finish a larger Webbased Project, and since I'm trying to keep the budget low (and learn some stuff myself) I'm not touching an Issue that I never touched before: load balancing with NGINX, and scalability for the future.
The setup is the following:
1 Web server
1 Database server
1 File server (also used to store backups)
Using PHP 5.4< over fastCGI
Now, all those servers should be 'scalable' - in the sense that I can add a new File Server, if the Free Disk Space is getting low, or a new Web Server if I need to handle more requests than expected.
Another thing is: I would like to do everything over one domain, so that the access to differend backend servers isnt really noticed in the frontend (some backend servers are basically called via subdomain - for example: the fileserver, over 'http://file.myserver.com/...' where a load balancing only between the file servers happens)
Do I need an additional, separate Server for load balancing? Or can I just use one of the web servers? If yes:
How much power (CPU / RAM) do I require for such a load-balancing server? Does it have to be the same like the webserver, or is it enough to have a 'lighter' server for that?
Does the 'load balancing' server have to be scalable too? Will I need more than one if there are too many requests?
How exactly does the whole load balancing work anyway? What I mean:
I've seen many entries stating, that there are some problems like session handling / synchronisation on load balanced systems. I could find 2 Solutions that maybe would fit my needs: Either the user is always directed to the same machine, or the data is stored inside a databse. But with the second, I basically would have to rebuild parts of the $_SESSION functionality PHP already has, right? (How do I know what user gets wich session, are cookies really enough?)
What problems do I have to expect, except the unsynchronized sessions?
Write scalable code - that's a sentence I read a lot. But in terms of PHP, for example, what does it really mean? Usually, the whole calculations for one user happens on one server only (the one where NGINX redirected the user at) - so how can PHP itself be scalable, since it's not actually redirected by NGINX?
Are different 'load balancing' pools possible? What I mean is, that all fileservers are in a 'pool' and all web servers are in a 'pool' and basically, if you request an image on a fileserver that has too much to do, it redirects to a less busy fileserver
SSL - I'll only need one certificate for the balance loading server, right? Since the data always goes back over the load balancing server - or how exactly does that work?
I know it's a huge question - basically, I'm really just searching for some advices / and a bit of a helping hand, I'm a bit lost in the whole thing. I can read snippets that partially answer the above questions, but really 'doing' it is completly another thing. So I already know that there wont be a clear, definitive answer, but maybe some experiences.
The end target is to be easily scalable in the future, and already plan for it ahead (and even buy stuff like the load balancer server) in time.

You can use one of web servers for load balacing. But it'll be more reliable to set the balacing on a separate machine. If your web servers responds not very quickly and you're getting many requests then load balancer will set the requests in the queue. For the big queue you need a sufficient amount of RAM.
You don't generally need to scale a load balancer.
Alternatively, you can create two or more A (address) records for your domain, each pointing to different web server's address. It'll give you a 'DNS load-balancing' without a balancing server. Consider this option.

PHP chat active users

I have added a chat capability to a site using jquery and PHP and it seems to generally work well, but I am worried about scalability. I wonder if anyone has some advice. The key area for me I think is efficiently managing awareness of who is onine.
detail:
I haven't implemented long-polling (yet) and I'm worried about the raw number of long-running processes in PHP (Apache) getting out of control.
My code runs a periodic jquery ajax poll (4secs), that first updates the db to say I am active and sets a timestamp.
Then there is a routine that checks the timestamp for all active users and sets those outside (10mins) to inactive.
This is fairly normal from my research so far. However, I am concenred that if I allow every active user to check every other active user and then everyone update the db to kick off inactive users, then I will get duplicated effort, record locks and unnecessary server load.
So I have implemented an idea of the role of a 'sweeper'. This is just one of the online users, who inherits the role of the person doing the cleanup. Everyone else just checks whether there is a 'sweeper' in existence (DB read) and carries on. If there is no sweeper when they check, they make themselves sweeper (DB write for their own record). If there are more than one, make yourself 'non-sweeper', sleep for a random period and check again.
My theory is that this way there is only one user regularly writing updates to several records on the relevant table and everyone else is either reading or just writing to their own record.
So it works OK, but the problem possibly is that the process requires a few DB reads and may actually be less efficient than just letting everyone do the cleanup as with other research as I mentioned.
I have had over 100 concurrent users running OK so far, but the client wants to scale up to several 100's, even over 1,000 and I have no idea of knowing at this stage whether this idea is good or not.
Does anyone know whether this is a good approach or not, whether it is scalable to hundreds of active users, or whether you can recommend a different approach?
AS an aside, long polling / comet for the actual chat messages seems simple and I have found a good resource for the code, but there are several blog comments that suggest it's dangerous with PHP and apache specifically. active threads etc. Impact minimsed with usleep and session_write_close.
Again does anyone have any practical experience of a PHP long polling set up for hundreds of active users, maybe you can put my mind at ease ! Do I really ahve to look to migrate this to node.js (no experience) ?
Thank you in advance
Tony

My advice would be to do this with meteor framework, which should be pretty trivial to do, even if you are not an expert, and then simply load such chat into your PHP website via iframe.
It will be scalable, won't consume much resources, and it will get only better in the future, I presume.
And it sure beats both PHP comet solutions and jquery & ajax timeout based calls to server.
I even believe you could find on github more or less a completed solution that just requires tweaking.
But of course, do read the docs before you implement it.
If you worry about security issues, read security with meteor

Long polling is indeed pretty disastrous for PHP. PHP is always runs with limited concurrent processes, and it will scale great as long as you optimize for handling each request as quickly as possible.
Long polling and similar solutions will quickly fill up your pipe.
It could be argued that PHP is simply not the right technology for this type of stuff, with the current tools out there. If you insist on using PHP you could try ReactPHP, which is a framework for PHP quite similar to how NodeJS is built. The implication with React is also that it's expected to run as a separate deamon, and not within a webserver such as apache. I have no experience on the stability of this, and how well it scales, so you will have to do the testing yourself.
NodeJS is not hard to get into, if you know javascript well. NodeJS + socket.io make it really easy to write the chat-server and client with websockets. This would be my recommendations. When I started with this is, I had something nice up and running within several hours.

If you want to keep your application stack using PHP, you want the chat application running in your actual web app (not an iframe) and your concerned about scaling your realtime infrastructure then I'd recommend you look at a hosted service for the realtime updates, such as Pusher who I work for. This way the hosted service handles the scaling of the realtime infrastructure for you and lets you concentrate on building your application functionality.
This way you only need to handle the chat message requests - sanitize/verify the content - and then push the information through Pusher to the 1000's of connected clients.
The quick start guide is available here:
http://pusher.com/docs/quickstart
I've a full list of hosted services on my realtime web tech guide.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.