PHP/Mysqli caching mysql host? Not respecting low TTL

PHP/Mysqli caching mysql host? Not respecting low TTL - php

We're using PHP on AWS, with RDS/Aurora.
This works by exposing an endpoint for the cluster, which was CNAME records to the currently active mysql nodes.
As we add/remove reader nodes, in the case of a failover, this endpoint is updated automatically, with a 5 second TTL.
As such, our app should see and respond to the new nodes very quickly.
We're noticing after a failover, we get 'Mysql has gone away' for much longer than the 5 second period. We've had instances of this being 30 minutes, at which point we restarted Apache and it resolved the issue.
It seems as though somewhere in the application, the database is not querying the endpoint DNS and resolving the new endpoints, therefore still pointing at a node which is no longer there.
We do use Persistent connections (for performance), which was the obvious culprit, however we've then tested with these turned off, and same behaviour exists.
We're using PHP 7.1, with Mysqli. We have a singleton class around the mysqli connection, but even if this kept the same connection open it would only last the time a single script executed, which is typically a few milliseconds.
Any guidance as to where the caching may be occurring?

It's unclear if your issue is DNS related on the remote services or caching on your own (AWS) local network/services. This is the first thing to look into.
To my knowledge Linux does not cache DNS lookups and nor does Apache/PHP (unless you're using mod_proxy, in which case look into disablereuse Off setting). With this in mind I expect that your Apache service restart causing it start working was likely a coincidence.
My first suggestion would be to force a fail-over and then immediately check with the name servers from several different geographical locations and using a terminal from your AWS server to see how long it takes the same servers to report updated results. The chances are name servers are simply ignoring your TTL, or perhaps just 'treating it as a suggestion'.
The long and the short of it is that DNS TTL is just a suggestion to resolving name servers as to how long to cache. There is nothing to enforce name servers to actually abide by what you set. And the reality of it is, many name services don't follow your setting exactly or even at all.
If the name-servers are updating as quickly as expected elsewhere, but not on your AWS server and mysql still can't connect; this suggests caching somewhere on your server or more likely within the AWS network. Unless the caching is directly on your server, which as discussed above I believe is unlikely, I doubt there is much you can do about this.
Ultimately updating a DNS record and using a low TTL as a fail-over solution is likely never going to achieve consistent <1 min fail-over speeds.
You may want to look into alternative methods such as ClusterControl or a proxy method such as ProxySQL.

Related

How much overhead does the NewRelic PHP agent add?

By no means, NewRelic is taking the world by storm with many successful deployments.
But what are the cons of using it in production?
PHP monitoring agent works as a .so extension. If I understand correctly, it connects to another system aggregation service, which filters data out and pushes them into the NewRelic cloud.
This simply means that it works transparently under the hood. However, is this actually true?
Any monitoring, profiling or api service adds some overhead to the entire stack.
The extension itself is 0.6 MB, which adds up to each php process, this isn't much so my concern is rather CPU and IO.
The image shows CPU Utilization on a production EC2 t1.micro instances with NewRelic agent (top blue one) and w/o the agent (other lines)
What does NewRelic really do what cause the additional overhead?
What are other negative sides when using it?

Your mileage may vary based on the settings, your particular site's code base, etc...
The additional overhead you're seeing is less the memory used, but the tracing and profiling of your php code and gathering analytic data on it as well as DB request profiling. Basically some additional overhead hooked into every php function call. You see similar overhead if you left Xdebug or ZendDebugger running on a machine or profiling. Any module will use some resources, ones that hook deep in for profiling can be the costliest, but I've seen new relic has config settings to dial back how intensively it profiles, so you might be able to lighten it's hit more than say Xdebug.
All that being said, with the newrelic shared PHP module loaded with the default setup and config from their site my company's website overall server response latency went up about 15-20% across the board when we turned it on for all our production machines. I'm only talking about the time it takes for php-fpm to generate an initial response. Our site is http://www.nara.me. The newrelic-daemon and newrelic-sysmon services running as well, but I doubt they have any impact on response time.
Don't get me wrong, I love new relic, but the perfomance hit in my specific situation hit doesn't make me want to keep the PHP module running on all our live load balanced machines. We'll probably keep it running on one machine all the time. We do plan to keep the sysmon stuff going 100% and keep the module disabled in case we need it for troubleshooting.
My advice is this:
Wrap any calls to new relic functions in if(function_exists ( $function_name )) blocks so your code can run without error if the new relic module isn't loaded
If you've multiple identical servers behind a loadbalancer sharing the same code, only enable the php module on one image to save performance. You can keep the sysmon stuff running if you use new relic for this.
If you've just one server, only enable the shared php module when you need it--when you're actually profiling your code or mysql unless a 10-20% performance hit isn't a problem.
One other thing to remember if your main source of info is the new relic website: they get paid by the number of machines you're monitoring, so don't expect them to convince you to not use it on anything less than 100% of your machines even if it not needed. I think one of their FAQ's or blogs state basically you should expect some performance impact, but if you use it as intended and fix the issues you see from it, you should recoup the latency lost. I agree, but I think once you fix the issues, limit the exposure to the smallest needed number of servers.

The agent shouldn't be adding much overhead the way it is designed. Because of the level of detail required to adequately troubleshoot the problem, this seems like a good question to ask at https://support.newrelic.com

What to care about when using a load balancer?

I have a web application written in PHP, is already deployed on an Apache server and works perfectly.
The application uses Mysql as db, session are saved in memcached server.
I am planning to move to an HAproxy environment with 2 servers.
What I know: I will deploy the application to the servers and configure HAproxy.
My question is: is there something I have to care about/change in the code ?

It depends.
Are you trying to solve a performance or redundancy problem?
If your database (MySQL) and session handler (memcached) are running on one or more servers separate from the two Apache servers, then the only major thing your code will have to do differently is manage the forwarded IP addresses (via X-FORWARDED-FOR), and HAProxy will happily round robin your requests between Apache servers.
If your database and session handler are currently running on the same server, then you need to decide if the performance or redundancy problem you are trying to solve is with the database, the session management, or Apache itself.
The easiest solution, for a performance problem with a database/session-heavy web app, is to simply start by putting MySQL and memcached on the second server to separate your concerns. If this solves the performance problem you were having with one server, then you could consider the situation resolved.
If the above solution does not solve the performance problem, and you notice that Apache is having trouble serving your website files, then you would have the option of a "hybrid" approach where Apache would exist on both servers, but then you would also run MySQL/memcached on one of the servers. If you decided to use this approach, then you could use HAProxy and set a lower weight to the hybrid server.
If you are attempting to solve a redundancy issue, then your best bet will be to isolate each piece into logical groups (e.g. database cluster, memcached cluster, Apache cluster, and a redundant HAProxy pair), and add redundancy to each logical group as you see fit.

The biggest issue that you are going to run into is going to be related to php sessions. By default php sessions maintain state with a single server. When you add the second server into the mix and start load balancing connections to both of them, then the PHP session will not be valid on the second server that gets hit.
Load balancers like haproxy expect a "stateless" application. To make PHP stateless you will more than likely need to use a different mechanism for your sessions. If you do not/can not make your application stateless then you can configure HAProxy to do sticky sessions either off of cookies, or stick tables (source IP etc).
The next thing that you will run into is that you will loose the original requestors IP address. This is because haproxy (the load balancer) terminates the TCP session and then creates a new TCP session to apache. In order to continue to see what the original requstors IP address is you will need to look at using something like x-forwarded-for. In the haproxy config the option is:
option forwardfor
The last thing that you are likely to run into is how haproxy handles keep alives. Haproxy has acl's, rules that determine where to route the traffic to. If keep alives are enabled, haproxy will only make the decision on where to send traffic based on the first request.
For example, lets say you have two paths and you want to send traffic to two different server farms (backends):
somedomain/foo -> BACKEND_serverfarm-foo
somedomain/bar -> BACKEND_serverfarm-bar
The first request for somedomain/foo goes to BACKEND_serverfarm-foo. The next request for somedomain/bar also goes to BACKEND_serverfarm-foo. This is because haproxy only processes the ACL's for the first request when keep alives are used. This may not be an issue for you because you only have 2 apache servers, but if it is then you will need to have haproxy terminate the keep alive session. Haproxy has several options for this but these two make the most since in this scenario:
option forceclose
option http-server-close
The high level difference is that forceclose closes both the server side and the client side keep alive session. http-server-close only closes the server side keep alive session which allows the client to maintain a keepalive with haproxy.

Memcahced on multiple servers in PHP

I have the following question regarding the memcached module in PHP:
Intro:
We're using the module to prevent the same queries from being sent to the Database server, on a site with 500+ users in every moment.
Sometimes (very rarely) the memcahed process defuncts and all active users start generating queries to the database, so everything stops working.
Question:
I know, that memcached supports multiple servers, but I want to know what happens when one of them dies? Is there some sort of balancer background or something, that can tell Ow! server 1 is dead. I'll send everything to server 2 until the server 1 goes back online. or the load is being sent equally to each one?
Possible sollutions:
I need to know this, because if it's not supported our sysadmin can set the current memcached server to be a load ballancer and to balance the load between several other servers.
Should I ask him to create the load-balancing manualy or is this feature supported by default and what are the risks for both methods?
Thank you!

You add multiple servers in your PHP script, not in Memcache's configuration.
When you use Memcached::addServers(), you can specify a weight for every server. In your case, you might set one Memcache server to be higher than the other and only have the second act as a failover.
Using Memcached::setOption(), you can set how often a connection should be retried and set the timeout. If you know your Memcache servers die a lot, it might be worth it to set this lower than the defaults, but it shouldn't be necessary.

PHP, MySQL and a large nummer of simple queries

I'm implementing an application that will have a lot of clients querying lots of small data packages from my webserver. Now I'm unsure whether to use persistent data connections to the database or not. The database is currently on the same system as the webserver and could connect via the socket, but this may change in the near future.
As I know a few releases of PHP ago mysqli_pconnect was removed because it behaved suboptimally. In the meantime it seems to be back again.
Based on my scenario I suppose I won't have an other chance to handle thousands of queries per minute but with loads of persistent connections and a MySQL configuration that reserves only little resources per connection, right?
Thanks for your input!

What happened when you tested it?
With the nest will in the world, there's no practical way you can convey all the informaton required for people to provide a definitive answer in a SO response. However usually there is very little overhead in establishing a mysql connection, particularly if it resides on the same system as the database client (in this case the webserver). There's even less overhead if you use a filesystem rather than a network socket.
So I'd suggest abstracting all your database calls so you can easily switch between connection types, but write your system to use on-demand connections, and ensure you code explicitly releases the connection as soon as practical - then see how it behaves.
C.

Are PHP persistant connections evil?
The problem is there can be only so
many connections active between Host
“Apache” and Host “MySQL”
Persistant connections usually give problems in that you hit the maximum number of connections. Also, in your case it does not give a great benefit since your database server is on the same host. Keep it to normal connections for now.

As they say, your mileage may vary, but I've never had good experiences using persistent connections from PHP, including MySQL and Oracle (both ODBC and OCI8). Every time I've tested it, the system fails to reuse connections. With high load, I end up hitting the top limit while I have hundreds of idle connections.
So my advice is that you actually try it and find out whether your set-up is reusing connections properly. If it isn't working as expected, it won't be a big lose anyway: opening a MySQL connection is not particularly costly compared to other DBMS.
Also, don't forget to reset all relevant settings when appropriate (whatever session value you change, it'll be waiting for you next time to stablish a connection and happen to reuse that one).

Should $new_link be used in mysql_connect()?

I'm maintaining an inherited site built on Drupal. We are currently experiencing "too many connections" to the database.
In the /includes/database.mysql.inc file, #mysql_connect($url['host'], $url['user'], $url['pass'], TRUE, 2) (mysql_connect() documentation) is used to connect to the database.
Should $new_link = TRUE be used? My understanding is that it will "always open a new link." Could this be causing the "too many connections"?

Editing core is a no no. You'll forget you did, upgrade the version, and bam, changes are gone.

Drupal runs several high-performance sites without problems. For instance, the Grammy Awards site switched to Drupal this year and for the first time the site didn't go down during the cerimony! some configuration needs tweaking on your setup. Probably mysql.
Edit your my.cfg and restart your mysql server (/etc/my.cfg in fedora, RH, centos and /etc/mysql/my.cfg on *buntu)
[mysqld]
max_connections=some-number-here
alternatively, to first try the change without restarting the server, login to mysql and try:
show variables like 'max_connections' #this tells you the current number
set global max_connections=some-number-here
Oh, and like another person said: DO. NOT. EDIT. DRUPAL. CORE. It does pay off if you want to keep your site updated, may cause inflexible headache and bring you a world of hurt.

MySQL, just like any RDBMS out there will limit the amount of connections that it accepts at any time. The my.cnf configuration file specifies this value for the server under the max_connections configuration. You can change this configuration, but there are real limitations depending on the capacity of your server.
Persistent connections may help reducing the amount of time it takes to connect to the database, but it has no impact on the total amount of connections MySQL will accept.
Connect to MySQL and use 'SHOW PROCESSLIST'. It will show you the currently open connections and what they do. You might have multiple connections sitting idle or running queries that take way too long. For idle connections, it might just be a matter of making sure your code does not keep connections open when they don't need them. For the second one, they may be parts of your code that need to be optimized so that the queries don't take too long.
If all connections are legitimate, you simply have more load than your current configuration allows for. If you MySQL load is low even with the current connection count, you can increase it a little and see how it evolves.
If you are not on a dedicated server, you might not be able to do much about this. It may just be someone else's code causing trouble.
Sometimes, those failures are just temporary. When it fails, you can simply retry the connection a few milliseconds later. If it still fails, it might be a problem and stopping the script is the right thing to do. Don't put that in an infinite loop (seen it before, terrible idea).

FYI: using permanent connections with Drupal is asking for trouble. Noone uses that as far as I know.

The new_link parameter only has effect, if you have multiple calls to mysql_connect(), during 1 request, which is probably not the case here.
I suspect it is caused by too many users visiting your site, simultaneously, because for each visitor, a new connection to the DB will be made.
If you can confirm that this is the case, mysql_pconnect() might help, because your problem is not with the stress on your database server, but the number of connections. You should also read Persistent database connections to see if it is applicable to your webserver setup, if you choose to go this route.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.