Question:
What are the rules/logic behind persistent connection management when using PDO?
Environment:
Web Server
Windows 7 x64
Dual-core with 16GB RAM
Apache 2.2.17
PHP 5.3.5
Connecting through DSN string with IP address, port, service name, etc...
No ODBC for DB conn (been trying to create one for 2 hours now, thanks Oracle!)
DB Server
Oracle 10g on Linux
Multi-core with 4GB RAM
Username specifically created for my web app (yes, it's fake)
user: webuser
My understanding/observations:
Non-persistent connections
<?php
// Open a new connection
// Session created in Oracle
$dbh = new PDO('DSN', 'webuser', 'password');
// webuser is active in v$session with a SID=1
$dbh = NULL;
// webuser removed from v$session
// Manually calling $dbh = NULL; will remove the session from v$session
// OR
// Wait for script EOL so a kill-session command is sent to Oracle?
?>
Script reliably takes about ~.09 seconds to execute with framework overhead, etc...
Persistent connections
<?php
// Open a new connection and make it persistent
// Session created in Oracle
// Is Apache maintaining some sort of keep-alive with Oracle here?
// because I thought php.exe is only alive for the duration of the script
$dbh = new PDO('DSN', 'webuser', 'password', array(PDO::ATTR_PERSISTENT => TRUE));
// webuser is active in v$session with a SID=1
$dbh = NULL;
// webuser is still active in v$session with a SID=1
$dbh = new PDO('DSN', 'webuser', 'password', array(PDO::ATTR_PERSISTENT => TRUE));
// webuser is still active in v$session with a SID=1
// Manually calling $dbh = NULL; does not kill session
// OR
// Script EOL does not kill session
// ^^ this is good, just as expected
?>
Script takes ~.12 seconds to execute upon initial visit with framework overhead, etc...
Sub-sequent executes take ~.04
The issue:
I visit the page and webuser gets a SID=1
My colleague visits the page and webuser gets an additional SID=2 <- rinse, repeat, and increment SID for new computers visiting this page
Shouldn't a new visitor be re-using SID=1?
All answers, suggestions, requests for alternate testing, links to reading material are welcomed.
I have RTFM'ed for a while and Googling has only produced meager Advantages of Persistent vs. Non-persistent blogs.
Apaches point of view
Apache has one parent process. This process creates child processes that will handle any requests coming to the web server.
The initial amount of child processes being started when the web server starts is configured by the StartServers directive in the apache configuration. The number goes up as needed with a raising amount of requests hitting the web server until ServerLimit is reached.
PHP and persistent connections
If PHP (ran as mod_php, as CGI all resources are freed at the end of script execution) is now being told to establish a persistent connection with a database for a request, this connection is hold even after the script finishes.
The connection being now hold is a connection between the apache child process which the request was handled by and the database server and can be re-used by any request that is being handled by this exact child process.
If, for some reason (do not ask me exactly why), the child process is being occupied longer than the actual request and another request comes in, the parent apache process redirects this request to a (new) child process which may has not established a connection to the database up to this time. If it has to during the execution of the script, it raises the SID as you have observed. Now there are two connections be hold by two different child processes of apache.
Keep in mind that...
It is important to know, that this can also cause a lot of trouble.
If there is an endless loop or an aborted transaction or some other may be even unpredictable error during the script execution, the connection is blocked and can not be re-used.
Also it could happen that all of the available connections of the database are used, but there is another child process of the apache server trying to access the database.
This process is blocked for the time being until a connection is freed by the database or apache (timeout or voluntarily by termination).
Any further information about this topic on this page: http://www.php.net/manual/en/features.persistent-connections.php
I hope I got all that we have discussed in our comment conversation summarized correctly and did not forget anything.
If so, please, leave me a hint and I will add it. :)
Edit:
I just finished reading the article #MonkeyZeus mentioned in this comment.
It describes the process I summarized above and provides useful information on how to optimize your apache server to work better together with persistent connections.
It can be used with or without oracle database backends, though.
You should give a look: http://www.oracle.com/technetwork/articles/coggeshall-persist-084844.html
Advantages
From the manual page of php for persistent connections on this link:
Persistent connections are links that do not close when the execution of your script ends. When a persistent connection is requested, PHP checks if there's already an identical persistent connection (that remained open from earlier) - and if it exists, it uses it. If it does not exist, it creates the link.
The reason behind using persistent connections is, of course, reducing the number of connections which are rather expensive; Even though they are much faster with MySQL than with most other databases.
Issues
There are some issues with table locking while using persistent connections.
if the script for whatever reason cannot release the lock, then subsequent scripts using the same connection will block indefinitely and may require that you either restart the httpd server or the database server.
Another is that when using transactions by mysql commit.
A transaction block will also carry over to the next script which uses that connection if script execution ends before the transaction block does. In either case, you can use register_shutdown_function() to register a simple cleanup function to unlock your tables or roll back your transactions.
I suggest you read this question about disatvantages of persistent connections.
PDO is kinda funny that way. Even the same user/visitor can cause a second or even third instance to be created. The same thing happened to me on my local machine, while testing the performance of my db queries.
That is nothing to worry about, because these instances will timeout sooner or later, the exact timeout depends on your server configuration.
Why that happens? If the current instance is busy, then a new instance will be created and the older one will timeout sooner or later. At least that seems logical to me.
Here are my observations as I recently experienced an issue similar to yours.
MySQL server kept opening new connections and eventually maxed out the number of concurrent connections to MySQL server even though there were a lot of Idle connections that can be used.
Setting PDO::ATTR_PERSISTENT => true does re-use available idle connections. It may not seem like this at first glance when you try to monitor MySQL process because in the time that the report is sent back to you, that idle connection may have become active by another process.
Overall, you should notice a drop in the number of idle connections as opposed to not using a persistent connection. As regards the table locking issue, I decided to use InnoDB Storage Engine for my tables as it uses row-level locking as opposed to table locking with MyISAM Storage Engine.
I have not had an issue yet with concurrency when using this combination of InnoDB Storage Engine and PDO persistent Connection.
Also, as a safeguard on badly executed queries locking tables, keep queries within a try-catch Block.
Related
I have an issue concerning PDO persistent connection. Now this may not be an actual problem, but I can't seem to find any post addressing this behavior.
I'm using the good old PDO in a persistent connection mode for my web app. Now I'm creating a new connection via new PDO(...).
When I run this script a new connection (C#1) is getting established and a MySql process (P#1) to accommodate the persistent connection.
So, I run the script again creating a new conction (C#2) and expecting C#2 to use the P#1 from the last connection. Every time I run this script a new process appears while the last one is still alive (in sleep mode).
On my production server there are about 350 prossers (in sleep) at any given time from 3 defrent users (all users connect from the same apache server).
The question: is this situation valid?
found my answer
They cause the child process to simply connect only once for its entire lifespan, instead of every time it processes a page that requires connecting to the SQL server. This means that for every child that opened a persistent connection will have its own open persistent connection to the server. For example, if you had 20 different child processes that ran a script that made a persistent connection to your SQL server, you'd have 20 different connections to the SQL server, one from each child.
http://php.net/manual/en/features.persistent-connections.php
I've read a ton about persistent database connections between PHP and MySQL (mysql_connect vs. mysql_pconnect). Same with PDO and MySQLi. It's definitely just my lack of understanding on this one, but how can a database connection be persistent between webpages? In this code:
$conn = mysql_pconnect( $server , $user, $pass );
mysql_select_db( $dbname );
If two users load this page at the same time, with two different $dbname variables, will PHP only make one connection to the database or two? I am fairly certain that
$conn = mysql_connect( $server , $user, $pass );
would make two connections.
If pconnect reuses the connection opened by the first user, will the mysql_select_db call work for the second user?
Ideally, what I am looking for is a way to have fewer database connections but still be able to set the default database in each PHP script. I have clients who all use the same PHP scripts, but the data is stored in their own client database (hence, $dbname is always different, but the MySQL connection parameters are the same - same mysql ip address, user and password).
Hope that makes sense. We can use MySQL, MySQLi or PDO, just need to know how to accomplish this the best way without having the possibility for clients to accidently write data to someone else's database! Thanks in advance.
The persistence is done by the copy of the PHP that's embedded in the webserver. Ordinarily you'd be right- if PHP was running in CGI mode, it would be impossible to have a persistent connection, because there'd be nothing left to persist when the request is done and PHP shuts down.
However, since there's a copy of PHP embedded in the webserver, and the webserver itself keeps running between requests, it is possible to maintain a pool of persistent connections within that "permanent" PHP.
However, note that on Apache multi-worker type server models, the connection pools are maintained PER-CHILD. If you set your pool limit to 10, you'll have 10 connections per Apache child. 20 children = 200 connections.
Persistent connections will also lead to long-term problems with deadlocks and other hard-to-debug problems. Remember - there's no guarantee that a user's HTTP requests will be serviced by the SAME apache child/mysql connection. If a script dies part-way through a database transaction, that transaction will NOT be rolled back, because MySQL does not see the HTTP side of things - all it sees is that the mysql<->apache connection is still open and assumes all's well.
The next user to hit that particular apache/mysql child/connection combination will now magically end up in the middle of that transaction, with no clue that the transaction is open. Basically, it's the Web equivalent of an unflushed toilet - all the "garbage" from the previous user is still there.
With non-persistent connections, you're guaranteed to have a 'clean' environment each time you connect.
From my reading of documentation and comments, I see:
Docs on mysql_pconnect (deprecated method)
Second, the connection to the SQL server will not be closed when the execution of the script ends. Instead, the link will remain open for future use ( mysql_close() will not close links established by mysql_pconnect()).
and a comment on that page
Persistent connections work well for CGI PHP managed by fastCGI, contrary to the suggestion above that they only work for the module version. That's because fastCGI keeps PHP processes running between requests. Persistent connections in this mode are easily made immune to connection limits too, because you can set PHP_FCGI_CHILDREN << mysql's max_connections <<< Apache's MaxClients. This also saves resources.
Docs on mysqli_connect (new method)
Prepending host by p: opens a persistent connection. mysqli_change_user() is automatically called on connections opened from the connection pool.
Docs for mysqli_change_user:
Changes the user of the specified database connection and sets the current database.
So my understanding is as follows: pconnect keeps the connection open after a script ends but while a process (or maybe group of processes) is still alive (like in a server with FCGI set up). Only one script at a time uses a connection, and when a new script grabs that connection the user and database are updated.
Thus if you use FCGI and persistent connections you can reduce the number of db connections open, but scripts running simultaneously will not be sharing the same connection. There is no problem with the connection being confused as to which database is selected.
I have a mongodb server in production serving on an EC2 instance. According to the mongodb official documentation, persistent DB connections should ALWAYS be used in production. I've been experimenting with about 50 persistent connections and was getting frequent connection errors (approx 33% of the time) while testing. I'm using this code:
$pid = 'db_'.rand(1,50);
$mongo = new Mongo("mongodb://{$user}:{$pass}#{$host}", array('persist' => $pid) );
Some background on the application, it's a link tracking application that is still ramping up - and is in the range of 500 - 1k writes per hour, nothing too crazy... yet.
I'm wondering if I simply need to allow more persistent connections? How does one determine the right balance of persistent connections versus server resources available?
Thanks in advance everyone.
The persist value is no longer supported as of the most recent driver (1.2.0).
Truth is, it was never really clear what it did in typical Apache+PHP setups. There are several comments on the Google Groups and elsewhere asking for detail, but I did not any evidence that persist or persistent was ever tested with any depth.
Instead, it's all been replaced by connection pooling "out of the box". The connection pooling has obviously been through some changes within the 1.2 line with the addition of the MongoPool class.
There is still no detailed explanation of how the pooling works with Apache, but at least you don't have to worry about persist.
Now despite all of this mess, I have handled 1000 times that traffic on a single MongoDB server via the PHP driver without lots of connection problems.
Are you catching the exceptions?
Can you provide more details about the exact exception?
There may be a code solution.
Are you opening a new connection for each PHP page request, or using a connection pool with 50 persistent connections? If you're opening a new connection each time then you might be quickly running out of resources.
Each connection uses an additional thread on the server, so you could be hitting a limit on the number of threads of network connections, check your server logs in /var/lib/mongodb for errors.
If you're using the official MongoDB PHP driver, then as far as I know it should handle connection pooling for you automatically. If you're connecting to Mongo from 50 separate clients, then consider putting a queue in front of Mongo to buffer the writes.
http://php.net/manual/en/mongo.connecting.php
without Persistent Connections x1000
It takes approximately 18 seconds to execute
Persistent
...it takes less than .02 seconds
In PDO, a connection can be made persistent using the PDO::ATTR_PERSISTENT attribute. According to the php manual -
Persistent connections are not closed at the end of the script, but
are cached and re-used when another script requests a connection using
the same credentials. The persistent connection cache allows you to
avoid the overhead of establishing a new connection every time a
script needs to talk to a database, resulting in a faster web
application.
The manual also recommends not to use persistent connection while using PDO ODBC driver, because it may hamper the ODBC Connection Pooling process.
So apparently there seems to be no drawbacks of using persistent connection in PDO, except in the last case. However., I would like to know if there is any other disadvantages of using this mechanism, i.e., a situation where this mechanism results in performance degradation or something like that.
Please be sure to read this answer below, which details ways to mitigate the problems outlined here.
The same drawbacks exist using PDO as with any other PHP database interface that does persistent connections: if your script terminates unexpectedly in the middle of database operations, the next request that gets the left over connection will pick up where the dead script left off. The connection is held open at the process manager level (Apache for mod_php, the current FastCGI process if you're using FastCGI, etc), not at the PHP level, and PHP doesn't tell the parent process to let the connection die when the script terminates abnormally.
If the dead script locked tables, those tables will remain locked until the connection dies or the next script that gets the connection unlocks the tables itself.
If the dead script was in the middle of a transaction, that can block a multitude of tables until the deadlock timer kicks in, and even then, the deadlock timer can kill the newer request instead of the older request that's causing the problem.
If the dead script was in the middle of a transaction, the next script that gets that connection also gets the transaction state. It's very possible (depending on your application design) that the next script might not actually ever try to commit the existing transaction, or will commit when it should not have, or roll back when it should not have.
This is only the tip of the iceberg. It can all be mitigated to an extent by always trying to clean up after a dirty connection on every single script request, but that can be a pain depending on the database. Unless you have identified creating database connections as the one thing that is a bottleneck in your script (this means you've done code profiling using xdebug and/or xhprof), you should not consider persistent connections as a solution to anything.
Further, most modern databases (including PostgreSQL) have their own preferred ways of performing connection pooling that don't have the immediate drawbacks that plain vanilla PHP-based persistent connections do.
To clarify a point, we use persistent connections at my workplace, but not by choice. We were encountering weird connection behavior, where the initial connection from our app server to our database server was taking exactly three seconds, when it should have taken a fraction of a fraction of a second. We think it's a kernel bug. We gave up trying to troubleshoot it because it happened randomly and could not be reproduced on demand, and our outsourced IT didn't have the concrete ability to track it down.
Regardless, when the folks in the warehouse are processing a few hundred incoming parts, and each part is taking three and a half seconds instead of a half second, we had to take action before they kidnapped us all and made us help them. So, we flipped a few bits on in our home-grown ERP/CRM/CMS monstrosity and experienced all of the horrors of persistent connections first-hand. It took us weeks to track down all the subtle little problems and bizarre behavior that happened seemingly at random. It turned out that those once-a-week fatal errors that our users diligently squeezed out of our app were leaving locked tables, abandoned transactions and other unfortunate wonky states.
This sob-story has a point: It broke things that we never expected to break, all in the name of performance. The tradeoff wasn't worth it, and we're eagerly awaiting the day we can switch back to normal connections without a riot from our users.
In response to Charles' problem above,
From : http://www.php.net/manual/en/mysqli.quickstart.connections.php -
A common complain about persistent connections is that their state is
not reset before reuse. For example, open and unfinished transactions
are not automatically rolled back. But also, authorization changes
which happened in the time between putting the connection into the
pool and reusing it are not reflected. This may be seen as an unwanted
side-effect. On the contrary, the name persistent may be understood as
a promise that the state is persisted.
The mysqli extension supports both interpretations of a persistent
connection: state persisted, and state reset before reuse. The default
is reset. Before a persistent connection is reused, the mysqli
extension implicitly calls mysqli_change_user() to reset the state.
The persistent connection appears to the user as if it was just
opened. No artifacts from previous usages are visible.
The mysqli_change_user() function is an expensive operation. For
best performance, users may want to recompile the extension with the
compile flag MYSQLI_NO_CHANGE_USER_ON_PCONNECT being set.
It is left to the user to choose between safe behavior and best
performance. Both are valid optimization goals. For ease of use, the
safe behavior has been made the default at the expense of maximum
performance.
Persistent connections are a good idea only when it takes a (relatively) long time to connect to your database. Nowadays that's almost never the case. The biggest drawback to persistent connections is that it limits the number of users you can have browsing your site: if MySQL is configured to only allow 10 concurrent connections at once then when an 11th person tries to browse your site it won't work for them.
PDO does not manage the persistence. The MySQL driver does. It reuses connections when a) they are available and the host/user/password/database match. If any change then it will not reuse a connection. The best case net effect is that these connections you have will be started and stopped so often because you have different users on the site and making them persistent doesn't do any good.
The key thing to understand about persistent connections is that you should NOT use them in most web applications. They sound enticing but they are dangerous and pretty much useless.
I'm sure there are other threads on this but a persistent connection is dangerous because it persists between requests. If, for example, you lock a table during a request and then fail to unlock then that table is going to stay locked indefinitely. Persistent connections are also pretty much useless for 99% of your apps because you have no way of knowing if the same connection will be used between different requests. Each web thread will have it's own set of persistent connections and you have no way of controlling which thread will handle which requests.
The procedural mysql library of PHP, has a feature whereby subsequent calls to mysql_connect will return the same link, rather than open a different connection (As one might expect). This has nothing to do with persistent connections and is specific to the mysql library. PDO does not exhibit such behaviour
Resource Link : link
In General you could use this as a rough "ruleset"::
YES, use persistent connections, if:
There are only few applications/users accessing the database, i.e.
you will not result in 200 open (but probably idle) connections,
because there are 200 different users shared on the same host.
The database is running on another server that you are accessing over
the network
An (one) application accesses the database very often
NO, don't use persistent connections, if:
Your application only needs to access the database 100 times an hour.
You have many, many webservers accessing one database server
Using persistent connections is considerable faster, especially if you are accessing the database over a network. It doesn't make so much difference if the database is running on the same machine, but it is still a little bit faster. However - as the name says - the connection is persistent, i.e. it stays open, even if it is not used.
The problem with that is, that in "default configuration", MySQL only allows 1000 parallel "open channels". After that, new connections are refused (You can tweak this setting). So if you have - say - 20 Webservers with each 100 Clients on them, and every one of them has just one page access per hour, simple math will show you that you'll need 2000 parallel connections to the database. That won't work.
Ergo: Only use it for applications with lots of requests.
On my tests I had a connection time of over a second to my localhost, thus assuming I should use a persistent connection. Further tests showed it was a problem with 'localhost':
Test results in seconds (measured by php microtime):
hosted web: connectDB: 0.0038912296295166
localhost: connectDB: 1.0214691162109 (over one second: do not use localhost!)
127.0.0.1: connectDB: 0.00097203254699707
Interestingly: The following code is just as fast as using 127.0.0.1:
$host = gethostbyname('localhost');
// echo "<p>$host</p>";
$db = new PDO("mysql:host=$host;dbname=" . DATABASE . ';charset=utf8', $username, $password,
array(PDO::ATTR_EMULATE_PREPARES => false,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION));
Persistent connections should give a sizable performance boost. I disagree with the assement that you should "Avoid" persistence..
It sounds like the complaints above are driven by someone using MyIASM tables and hacking in their own versions of transactions by grabbing table locks.. Well of course you're going to deadlock! Use PDO's beginTransaction() and move your tables over to InnoDB..
seems to me having a persistent connection would eat up more system resources. Maybe a trivial amount, but still...
The explanation for using persistent connections is obviously reducing quantity of connects that are rather costly, despite the fact that they're considerably faster with MySQL compared to other databases.
The very first trouble with persistent connections...
If you are creating 1000's of connections per second you normally don't ensure that it stays open for very long time, but Operation System does. Based on TCP/IP protocol Ports can’t be recycled instantly and also have to invest a while in “FIN” stage waiting before they may be recycled.
The 2nd problem... using a lot of MySQL server connections.
Many people simply don't realize you are able to increase *max_connections* variable and obtain over 100 concurrent connections with MySQL others were beaten by older Linux problems of the inability to convey more than 1024 connections with MySQL.
Allows talk now about why Persistent connections were disabled in mysqli extension. Despite the fact that you can misuse persistent connections and obtain poor performance which was not the main reason. The actual reason is – you can get a lot more issues with it.
Persistent connections were put into PHP throughout occasions of MySQL 3.22/3.23 when MySQL was not so difficult which means you could recycle connections easily with no problems. In later versions quantity of problems however came about – Should you recycle connection that has uncommitted transactions you take into trouble. If you recycle connections with custom character set configurations you’re in danger again, as well as about possibly transformed per session variables.
One trouble with using persistent connections is it does not really scale that well. For those who have 5000 people connected, you'll need 5000 persistent connections. For away the requirement for persistence, you may have the ability to serve 10000 people with similar quantity of connections because they are in a position to share individuals connections when they are not with them.
I was just wondering whether a partial solution would be to have a pool of use-once connections. You could spend time creating a connection pool when the system is at low usage, up to a limit, hand them out and kill them when either they've completed or timed out. In the background you're creating new connections as they're being taken. At worst case this should only be as slow as creating the connection without the pool, assuming that establishing the link is the limiting factor?
My understanding is that PHP's p* connections is that it keeps a connection persistent between page loads to the service (be it memcache, or a socket etc). But are these connections thread safe? What happens when two pages try to access the same connection at the same time?
In the typical unix deployment, PHP is installed as a module that runs inside the apache web server, which in turn is configured to dispatch HTTP requests to one of a number of spawned children.
For the sake of efficiency, apache will often spawn these processes ahead of time (pre-forking them) and maintain them, so that they can dispatch more than one request, and save the overhead of starting up a process for every request that comes in.
PHP works on the principle of starting every request with a clean environment; no script variables persist between page loads. (Contrast this with mod_perl or python, where applications often manifest subtle bugs due to unexpected state hangovers).
This means that the typical resource allocated by a PHP script, be it an image handle for GD or a database connection, will be released at the end of a request.
Some resources, particularly Oracle database connections, have quite a high cost to establish, so it is desirable to somehow cache that connection between dispatched web requests.
Enter persistent resources.
The way these work is that any given apache child process may maintain a resource beyond the scope of a request by registering it in a "persistent list" of resources. The persistent list is not cleaned up at the end of the request (known as RSHUTDOWN internally). When you use a pconnect function, it will look up the persistent list entry for a given set of unique credentials and return that, if it exists, or establish a new connection with those credentials.
If you have configured apache to maintain 200 child processes, you should expect to see that many connections established from your web server to your database machine.
If you have many web servers and a single database machine, you may end loading your database machine much more than you anticipated.
With a threaded SAPI, the persistent list is maintained per thread, so it should be thread safe and have similar benefits, but the usual caveat about PHP not being recommended to run in threaded SAPI applies--while PHP is itself thread safe, so many libraries that it uses may have thread safety problems of their own and cause you a good number of headaches.
The manual's page Persistent Database Connections might get you a couple of informations about persistent connections.
It doesn't say anything specific about thread safety, still ; I've quite never seen anything about that anywhere, as far as I remember, so I suppose it "just works OK". My guess would be a connection is re-used only if not already used by another thread at the same time, but it's just some kind of (logical) wild guess...
Generally speaking, PHP will make one persistent connection per process or thread running on the webserver. Because of this, a process or thread will not access the connection of another process or thread.
Instead, when you make a database connection PHP will check to see if one is already open (in the process or thread that is handling the page request) and if it is then it will use it, otherwise it will just initialize a new one.
So to answer your question, they aren't necessarily thread safe but because of how they operate there isn't a situation where two threads or processes will access the same connection.
Generally speaking, when a PHP script requests a persistent connection, PHP will look for one in the connection pool with the same connection parameters.
If one is found that is NOT being used, it is given to the script, and returned to the pool at the end of the script.