What to care about when using a load balancer?

What to care about when using a load balancer? - php

I have a web application written in PHP, is already deployed on an Apache server and works perfectly.
The application uses Mysql as db, session are saved in memcached server.
I am planning to move to an HAproxy environment with 2 servers.
What I know: I will deploy the application to the servers and configure HAproxy.
My question is: is there something I have to care about/change in the code ?

It depends.
Are you trying to solve a performance or redundancy problem?
If your database (MySQL) and session handler (memcached) are running on one or more servers separate from the two Apache servers, then the only major thing your code will have to do differently is manage the forwarded IP addresses (via X-FORWARDED-FOR), and HAProxy will happily round robin your requests between Apache servers.
If your database and session handler are currently running on the same server, then you need to decide if the performance or redundancy problem you are trying to solve is with the database, the session management, or Apache itself.
The easiest solution, for a performance problem with a database/session-heavy web app, is to simply start by putting MySQL and memcached on the second server to separate your concerns. If this solves the performance problem you were having with one server, then you could consider the situation resolved.
If the above solution does not solve the performance problem, and you notice that Apache is having trouble serving your website files, then you would have the option of a "hybrid" approach where Apache would exist on both servers, but then you would also run MySQL/memcached on one of the servers. If you decided to use this approach, then you could use HAProxy and set a lower weight to the hybrid server.
If you are attempting to solve a redundancy issue, then your best bet will be to isolate each piece into logical groups (e.g. database cluster, memcached cluster, Apache cluster, and a redundant HAProxy pair), and add redundancy to each logical group as you see fit.

The biggest issue that you are going to run into is going to be related to php sessions. By default php sessions maintain state with a single server. When you add the second server into the mix and start load balancing connections to both of them, then the PHP session will not be valid on the second server that gets hit.
Load balancers like haproxy expect a "stateless" application. To make PHP stateless you will more than likely need to use a different mechanism for your sessions. If you do not/can not make your application stateless then you can configure HAProxy to do sticky sessions either off of cookies, or stick tables (source IP etc).
The next thing that you will run into is that you will loose the original requestors IP address. This is because haproxy (the load balancer) terminates the TCP session and then creates a new TCP session to apache. In order to continue to see what the original requstors IP address is you will need to look at using something like x-forwarded-for. In the haproxy config the option is:
option forwardfor
The last thing that you are likely to run into is how haproxy handles keep alives. Haproxy has acl's, rules that determine where to route the traffic to. If keep alives are enabled, haproxy will only make the decision on where to send traffic based on the first request.
For example, lets say you have two paths and you want to send traffic to two different server farms (backends):
somedomain/foo -> BACKEND_serverfarm-foo
somedomain/bar -> BACKEND_serverfarm-bar
The first request for somedomain/foo goes to BACKEND_serverfarm-foo. The next request for somedomain/bar also goes to BACKEND_serverfarm-foo. This is because haproxy only processes the ACL's for the first request when keep alives are used. This may not be an issue for you because you only have 2 apache servers, but if it is then you will need to have haproxy terminate the keep alive session. Haproxy has several options for this but these two make the most since in this scenario:
option forceclose
option http-server-close
The high level difference is that forceclose closes both the server side and the client side keep alive session. http-server-close only closes the server side keep alive session which allows the client to maintain a keepalive with haproxy.

Related

Adding websockets to existing application

So I wrote this nice SAAS solution and got some real clients. Now a request was made by a client to add some functionality that requires websockets.
In order to keep things tidy, I would like to use another server to manage the websockets.
My current stack is an AWS application loadbalancer, behind it two servers - one is the current application server. It's an Apache web server with PHP on it running the application.
The client side is driven by AngularJS.
The second server (which does not exist yet) is going to Nginx, and session will be stored on a third Memcache server.
Websocket and application server are both on the same domain, using different port in order to send the requests to the right server (AWS ELB allows sending requests to different server groups by port). Both the application and websockets will be driven by PHP and Ratchet.
My questions are two:
is for the more experienced developers - does such architecture sounds reasonable (I'm not aiming for 100Ks of concurrent yet - I need a viable and affordable solution aiming to max 5000 concurrents at this stage).
What would be the best way to send requests from the application server (who has the logic to generate the requests) to the websocket server?
Please notice I'm new to websockets so maybe there are much better ways to do this - I'll be grateful for any idea.

I'm in the middle of using Ratchet with a SPA to power a web app. I'm using Traefik as a front-end proxy, but yes, Nginx is popular here, and I'm sure that would be fine. I like Traefik for its ability to seamlessly redirect traffic based on config file changes, API triggers, and configuration changes from the likes of Kubernetes.
I agree with Michael in the comments - using ports for web sockets other than 80 or 443 may cause your users to experience connection problems. Home connections are generally fine on non-standard ports, but public wifi, business firewalls and mobile data can all present problems, and it's probably best not to risk it.
Having done a bit of reading around, your 5,000 concurrent connections is probably something that is going to need OS-level tweaks. I seem to recall 1,024 connections can be done comfortably, but several times that level would need testing (e.g. see here, though note the comment stream goes back a couple of years). Perhaps you could set up a test harness that fires web socket requests at your app, e.g. using two Docker containers? That will give you a way to understand what limits you will run into at this sort of scale.
Your maximum number of concurrent users strikes me that you're working at an absolutely enormous scale, given that any given userbase will usually not all be live at the same time. One thing you can do (and I plan to do the same in my case) is to add a reconnection strategy in your frontend that does a small number of retries, and then pops up a manual reconnect box (Trello does something like this). Given that some connections will be flaky, it is probably a good idea to give some of them a chance to die off, so you're left with fewer connections to manage. You can add an idle timer in the frontend also, to avoid pointlessly keeping unused connections open.
If you want to do some functional testing, consider PhantomJS - I use PHPUnit Spiderling here, and web sockets seems to work fine (I have only tried one at a time so far, mind you).
There are several PHP libraries to send web socket requests. I use Websocket Client for PHP, and it has worked flawlessly for me so far.

Multiple webservers (NGINX) behind Load Balancer, share settings (database connections etc.)

I'm designing a system that will require several web servers (NGINX) behind a load balancer.
My question is: what techniques do you suggest using for sharing settings among all webbservers (hosting a PHP-app)? Let's say that I have to change the credentials for the database connection. In that case I don't want to log in to every single server and change all of the config files.
What do you suggest I do to be able to update those variables in one place so it's accessible for all web servers. I've considered having a small server in the middle which all servers read from (through a scp-connection or such), but I don't want a single point of failure.

There are various solutions to automate server management like Puppet and Chef, but if you are just getting started with two our three servers, consider a tool that's like you send broadcast the same SSH session to multiple hosts. Terminator for Gnome is great and there's also CSSHX for mac (which I haven't tried).
It's great to be able to see both terminals at once in case there are small differences between the server from prior manual maintenance. In Terminator, you can easily switch between broadcasting your commands to a group of terminals and typing into a specific one. You can set up a layout that automatically starts up two joined sessions and SSH's into all the servers in a given cluster. In other words, it's as easy to use as a SSH session normally would be, but works for multiple servers.
When project size and complexity grows, consider switching to more fully automated server management.

Load balancing and APC

I am interested in a scenario where webservers serving a PHP application is set up with a load balancer.
There will be multiple webservers with APC behind the load balancer. All requests will have to go through the load balancer, which then sends it to one of the web servers to process.
I understand that memcached should be used for distributed caching, but I think having the APC cache on each machine cache things like application configurations and other objects that will NOT be different across any of the servers would yield even better performance.
There is also an administrator area for this application. It is also accessed via the load balancer (for example, site.com/admin). In a case like this, how can I call apc_clear_cache to clear the APC object cache on ALL servers?

Externally in your network you have a public IP you use to route all your requests to your load balancer that distributes load round robin so outside you cannot make a request to clear your cache on each server one at a time because you don't know which one is being used at any given time. However, within your network, each machine has its own internal IP and can be called directly. Knowing this you can do some funny/weird things that do work externally.
A solution I like is to be able to hit a single URL and get everything done such as http://www.mywebsite/clearcache.php or something like that. If you like that as well, read on. Remember you can have this authenticated if you like so your admin can hit this or however you protect it.
You could create logic where you can externally make one request to clear your cache on all servers. Whichever server receives the request to clear cache will have the same logic to talk to all servers to clear their cache. This sounds weird and a bit frankenstein but here goes the logic assuming we have 3 servers with IPs 10.232.12.1, 10.232.12.2, 10.232.12.3 internally:
1) All servers would have two files called "initiate_clear_cache.php" and "clear_cache.php" that would be the same copies for all servers.
2) "initiate_clear_cache.php" would do a file_get_contents for each machine in the network calling "clear_cache.php" which would include itself
for example:
file_get_contents('http://10.232.12.1/clear_cache.php');
file_get_contents('http://10.232.12.2/clear_cache.php');
file_get_contents('http://10.232.12.3/clear_cache.php');
3) The file called "clear_cache.php" is actually doing the cache clearing for its respective machine.
4) You only need to make a single request now such as http://www.mywebsite/initial_clear_cache.php and you are done.
Let me know if this works for you. I've done this in .NET and Node.js similar but haven't tried this in PHP yet but I'm sure the concept is the same. :)

PHP Sessions to handle Multiple Servers

All,
I have a PHP5 web application written with Zend Framework and MVC. This application is installed on 2 servers with the same setup. Server X has php5/MySql/Apache and Server Y also have the same. We don't have a common DB server between both the servers.
My application works when accessed individually via https on Server X and Server Y. But when we turn on load balancing and have both servers up, the sessions get lost.
How can I make sure my sessions persist across servers? Should I maintain my db on a third server and write sessions to it? IF so, what's the easiest and most secure way to do it?
Thanks

memcached is a popular way to solve this problem. You just need to get it up and running (easy) and update your php.ini file to tell it to use memcached as the session storage.
In php.ini you would modify:
session.save_handler = memcache
session.save_path = ""
For the general idea: PHP Sessions in Memcached.
There are any number of tutorials on setting up the Zend session handler to work with memcached. Take your pick.

Should I maintain my db on a third
server and write sessions to it?
Yes, one way to handle it is to have a 3rd machine running the database that both webservers use for the application. I've done that for several projects in the past and its worked well. The question with that approach is... is the bottleneck at the webservers or the database. If its at the database, you wont see much improvement by throwing load balancing of the web servers into the mix. You may need to instead think of mirroring schemes for the database.

Another option is to use the sticky sessions feature on your load balancer. What this will do is keep users on certain servers. So when user 1 comes to the site, they will be directed to server X. Every subsequent request will also be directed to server X. This allows you to not worry about persisting sessions between servers, as each user will continue to be directed to the server they have their session on.
The one downside of this is that when you take a web server out of the pool, half the users with a session will be logged out. So the effectiveness of this solution depends on how often you take servers out of the pool.

How to scale php application servers horizontally?

I'm using lighttpd as webserver for php application server. The avg. load on this server is about 2-3. MySQL database is separated to another server (it's load ~0.4). How could I scale php application server?
Thank you.

In a few words, a solution, generally speaking, is to :
Have several PHP servers
Have another load-balancing server in front of those
If you have 3 PHP servers, then, this load-balancer will send 33% of the requests it receives on each PHP server.
The load-balancing server can be some specialized hardware ; or a reverse-proxy, using Apache (see mod_proxy_balancer for example), nginx (see Load Balancing with Nginx, for example), varnish, ...
The most important problems you'll generally face when using several PHP servers instead of ones are related to the filesystem : with several servers, each server has its own disks and filesystem.
For example, if a user is randomly balanced on server 1 for one page, and server 2 for another page, you cannot use file-based sessions : the session will be created on server 1, but will not be found on server 2, later.
In this specific case, you'll have to use another mecanism to store sessions -- store them in a database, or memcached, for example.
Same things with images (uploaded by users, for example) : you'll have to either :
Synchronise them between servers
Or use some kind of network-drive
Edit after the comment : For the deployment, with several servers, I generally do exactly as I would with only one server.
For more informations on the kind of process I often use, you can take a look at the answer I gave a while back on this question : Updating a web app without any downtime.
The explanation given there is for one server, but to do the same thing on several servers, you can :
Upload the archive to each one of your servers
Extract it
And, at almost the same instant, switch the symlink on all servers
At worst, you'll have 2 or 3 seconds of delay between each servers, which means 10 seconds if you have 5 servers ? It's probably not a big problem :-)
(I've done this with up to 7 servers, for one application, and never had any problem)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.