How do PHP's p* connect methods work?

How do PHP's p* connect methods work? - php

My understanding is that PHP's p* connections is that it keeps a connection persistent between page loads to the service (be it memcache, or a socket etc). But are these connections thread safe? What happens when two pages try to access the same connection at the same time?

In the typical unix deployment, PHP is installed as a module that runs inside the apache web server, which in turn is configured to dispatch HTTP requests to one of a number of spawned children.
For the sake of efficiency, apache will often spawn these processes ahead of time (pre-forking them) and maintain them, so that they can dispatch more than one request, and save the overhead of starting up a process for every request that comes in.
PHP works on the principle of starting every request with a clean environment; no script variables persist between page loads. (Contrast this with mod_perl or python, where applications often manifest subtle bugs due to unexpected state hangovers).
This means that the typical resource allocated by a PHP script, be it an image handle for GD or a database connection, will be released at the end of a request.
Some resources, particularly Oracle database connections, have quite a high cost to establish, so it is desirable to somehow cache that connection between dispatched web requests.
Enter persistent resources.
The way these work is that any given apache child process may maintain a resource beyond the scope of a request by registering it in a "persistent list" of resources. The persistent list is not cleaned up at the end of the request (known as RSHUTDOWN internally). When you use a pconnect function, it will look up the persistent list entry for a given set of unique credentials and return that, if it exists, or establish a new connection with those credentials.
If you have configured apache to maintain 200 child processes, you should expect to see that many connections established from your web server to your database machine.
If you have many web servers and a single database machine, you may end loading your database machine much more than you anticipated.
With a threaded SAPI, the persistent list is maintained per thread, so it should be thread safe and have similar benefits, but the usual caveat about PHP not being recommended to run in threaded SAPI applies--while PHP is itself thread safe, so many libraries that it uses may have thread safety problems of their own and cause you a good number of headaches.

The manual's page Persistent Database Connections might get you a couple of informations about persistent connections.
It doesn't say anything specific about thread safety, still ; I've quite never seen anything about that anywhere, as far as I remember, so I suppose it "just works OK". My guess would be a connection is re-used only if not already used by another thread at the same time, but it's just some kind of (logical) wild guess...

Generally speaking, PHP will make one persistent connection per process or thread running on the webserver. Because of this, a process or thread will not access the connection of another process or thread.
Instead, when you make a database connection PHP will check to see if one is already open (in the process or thread that is handling the page request) and if it is then it will use it, otherwise it will just initialize a new one.
So to answer your question, they aren't necessarily thread safe but because of how they operate there isn't a situation where two threads or processes will access the same connection.

Generally speaking, when a PHP script requests a persistent connection, PHP will look for one in the connection pool with the same connection parameters.
If one is found that is NOT being used, it is given to the script, and returned to the pool at the end of the script.

Related

Is PDO::lastInsertId() in multithread single connection safe?

I read some threads here about PDO::lastInsertId() and its safety. It returns last inserted ID from current connection (so it's safe for multiuser app while there is only one connection per user/script run).
I'm just wondering if there is a possibility to get invalid ID if there is only one DB connection per one long script (lots of SQL requests) in multicore server system? The question is more likely to be theoretical.
I think PHP script run is linear but maybe I'm wrong.

PDO itself is not thread safe. You must provide your own thread safety if you use PDO connections from a threaded application.
The best, and in my opinion the only maintainable, way to do this is to make your connections thread-private.
If you try to use one connection from more than one thread, your MySQL server will probably throw Packet Out of Order errors.
The Last Insert ID functionality ensures multiple connections to MySQL get their own ID values even if multiple connections do insert operations to the same table.
For a typical php web application, using a multicore server allows it to handle more web-browser requests. A multicore server doesn’t make the php programs multithreaded. Each php program, to handle each web request, allocates is own PDO connections. As you put it, each php script run is “linear”. The multiple cores allow multiple scripts to run at the same time, but independently.
Last Insert ID is designed to be safe for that scenario.
Under some circumstances a php program may leave the MySQL connection open when it's done so another php program may use it. This is is called a persistent connection or connection pooling. It helps performance when a web site has many users connecting to it. The generic term for a reusable connection is "serially reusable resource.*
Some php programs may use threads. In this case the program must avoid allowing more than one thread to use the same connection at the same time, or get the dreaded Packet Out of Order errors.
(Virtually all machines have multiple cores.)

How to Share a MySQL connection between 2 different PHP Processes

I have an unusual case, my website runs two totally different PHP process through a sandbox. I have the normal website running through fastcgi and in the middle that fastcgi process executes one sandboxed script through cli. Both those processes require a MySQL connection and I was wandering if there is a way to share that connection since when the sandboxed script is running the fastcgi is just waiting for it to finish so there would be no concurrency.
This would greatly improve my hardware capability since I would only need one MySQL connection per client unlike the two connections that I need at the moment.
I could always code some kind of multiplexing proxy for this effect but is there any run of the mill solution? I would really appreciate.
Regards.

use database connection pooling middleware or proxy
sqlrelay
mysql-proxy
proxysql

It might be worth having a look at Persistent Connections for this.
Basically the connection would be automatically re-used if it exists. Note that this is referring to the resource itself, it does not persist any state from one process to the other.
Before making the decision to use Persistent Connections you should be aware of the pitfalls when used incorrectly. See this question.

Should I $mysqli->close a connection after each page load, if PHP runs via FCGI?

I run PHP via FCGI - that is my web server spawns several PHP processes and they keep running for like 10,000 requests until they get recycled.
My question is - if I've a $mysqli->connect at the top of my PHP script, do I need to call $mysqli->close in when I'm about to end running the script?
Since PHP processes are open for a long time, I'd image each $mysqli->connect would leak 1 connection, because the process keeps running and no one closes the connection.
Am I right in my thinking or not? Should I call $mysqli->close?

When PHP exits it closes the database connections gracefully.
The only reason to use the close method is when you want to terminate a database connection that you´ll not use anymore, and you have lots of things to do: Like processing and streaming the data, but if this is quick, you can forget about the close statement.
Putting it in the end of a script means redundancy, no performance or memory gain.

In a bit more detail, specifically about FastCGI:
FastCGI keeps PHP processing running between requests. FastCGI is good at reducing CPU usage by leveraging your server's available RAM to keep PHP scripts in memory instead of having to start up a separate PHP process for each and every PHP request.

FastCGI will start a master process and as many forks of that master process as you have defined and yes those forked processes might life for a long time. This means in effect that the process doesn't have to start-up the complete PHP process each time it needs to execute a script. But it's not like you think that your scripts are now running all the time. There is still a start-up and shutdown phase each time a script has to be executed. At this point things like global variables (e.g. $_POST and $_GET) are populated etc. You can execute functions each time your process shuts down via register_shutdown_function().
If you aren't using persistent database connections and aren't closing database connections, nothing bad will happen. As Colin Schoen explained, PHP will eventually close them during shutdown.
Still, I highly encourage you to close your connections because a correctly crafted program knows when the lifetime of an object is over and cleans itself up. It might give you exactly the milli- or nanosecond that you need to deliver something in time.
Simply always create self-contained objects that are also cleaning up after they are finished with whatever they did.

I've never trusted FCGI to close my database connections for me. One habit I learned in a beginners book many years ago is to always explicitly close my database connections.
Is not typing sixteen keystrokes worth the possible memory and connection leak? As far as I'm concerned its cheap insurance.

If you have long running FastCGI processes, via e.g. php-fpm, you can gain performance by reusing your database connection inside each process and avoiding the cost of opening one.
Since you are most likely opening a connection at some point in your code, you should read up on how to have mysqli open a persistent connection and return it to you on subsequent requests managed by the same process.
http://php.net/manual/en/mysqli.quickstart.connections.php
http://php.net/manual/en/mysqli.persistconns.php
In this case you don't want to close the connection, else you are defeating the purpose of keeping it open. Also, be aware that each PHP process will use a separate connection so your database should allow for at least that number of connections to be opened simultaneously.

You're right in your way of thinking. It is still important to close connection to prevent memory/data leaks and corruption.
You can also lower the amount of scipts recycled each cycle, to stop for a connection close.
For example: each 2500 script runs, stop and close and reopen connection.
Also recommended: back up data frequently.
Hope I helped. Phantom

What are the disadvantages of using persistent connection in PDO

In PDO, a connection can be made persistent using the PDO::ATTR_PERSISTENT attribute. According to the php manual -
Persistent connections are not closed at the end of the script, but
are cached and re-used when another script requests a connection using
the same credentials. The persistent connection cache allows you to
avoid the overhead of establishing a new connection every time a
script needs to talk to a database, resulting in a faster web
application.
The manual also recommends not to use persistent connection while using PDO ODBC driver, because it may hamper the ODBC Connection Pooling process.
So apparently there seems to be no drawbacks of using persistent connection in PDO, except in the last case. However., I would like to know if there is any other disadvantages of using this mechanism, i.e., a situation where this mechanism results in performance degradation or something like that.

Please be sure to read this answer below, which details ways to mitigate the problems outlined here.
The same drawbacks exist using PDO as with any other PHP database interface that does persistent connections: if your script terminates unexpectedly in the middle of database operations, the next request that gets the left over connection will pick up where the dead script left off. The connection is held open at the process manager level (Apache for mod_php, the current FastCGI process if you're using FastCGI, etc), not at the PHP level, and PHP doesn't tell the parent process to let the connection die when the script terminates abnormally.
If the dead script locked tables, those tables will remain locked until the connection dies or the next script that gets the connection unlocks the tables itself.
If the dead script was in the middle of a transaction, that can block a multitude of tables until the deadlock timer kicks in, and even then, the deadlock timer can kill the newer request instead of the older request that's causing the problem.
If the dead script was in the middle of a transaction, the next script that gets that connection also gets the transaction state. It's very possible (depending on your application design) that the next script might not actually ever try to commit the existing transaction, or will commit when it should not have, or roll back when it should not have.
This is only the tip of the iceberg. It can all be mitigated to an extent by always trying to clean up after a dirty connection on every single script request, but that can be a pain depending on the database. Unless you have identified creating database connections as the one thing that is a bottleneck in your script (this means you've done code profiling using xdebug and/or xhprof), you should not consider persistent connections as a solution to anything.
Further, most modern databases (including PostgreSQL) have their own preferred ways of performing connection pooling that don't have the immediate drawbacks that plain vanilla PHP-based persistent connections do.
To clarify a point, we use persistent connections at my workplace, but not by choice. We were encountering weird connection behavior, where the initial connection from our app server to our database server was taking exactly three seconds, when it should have taken a fraction of a fraction of a second. We think it's a kernel bug. We gave up trying to troubleshoot it because it happened randomly and could not be reproduced on demand, and our outsourced IT didn't have the concrete ability to track it down.
Regardless, when the folks in the warehouse are processing a few hundred incoming parts, and each part is taking three and a half seconds instead of a half second, we had to take action before they kidnapped us all and made us help them. So, we flipped a few bits on in our home-grown ERP/CRM/CMS monstrosity and experienced all of the horrors of persistent connections first-hand. It took us weeks to track down all the subtle little problems and bizarre behavior that happened seemingly at random. It turned out that those once-a-week fatal errors that our users diligently squeezed out of our app were leaving locked tables, abandoned transactions and other unfortunate wonky states.
This sob-story has a point: It broke things that we never expected to break, all in the name of performance. The tradeoff wasn't worth it, and we're eagerly awaiting the day we can switch back to normal connections without a riot from our users.

In response to Charles' problem above,
From : http://www.php.net/manual/en/mysqli.quickstart.connections.php -
A common complain about persistent connections is that their state is
not reset before reuse. For example, open and unfinished transactions
are not automatically rolled back. But also, authorization changes
which happened in the time between putting the connection into the
pool and reusing it are not reflected. This may be seen as an unwanted
side-effect. On the contrary, the name persistent may be understood as
a promise that the state is persisted.
The mysqli extension supports both interpretations of a persistent
connection: state persisted, and state reset before reuse. The default
is reset. Before a persistent connection is reused, the mysqli
extension implicitly calls mysqli_change_user() to reset the state.
The persistent connection appears to the user as if it was just
opened. No artifacts from previous usages are visible.
The mysqli_change_user() function is an expensive operation. For
best performance, users may want to recompile the extension with the
compile flag MYSQLI_NO_CHANGE_USER_ON_PCONNECT being set.
It is left to the user to choose between safe behavior and best
performance. Both are valid optimization goals. For ease of use, the
safe behavior has been made the default at the expense of maximum
performance.

Persistent connections are a good idea only when it takes a (relatively) long time to connect to your database. Nowadays that's almost never the case. The biggest drawback to persistent connections is that it limits the number of users you can have browsing your site: if MySQL is configured to only allow 10 concurrent connections at once then when an 11th person tries to browse your site it won't work for them.
PDO does not manage the persistence. The MySQL driver does. It reuses connections when a) they are available and the host/user/password/database match. If any change then it will not reuse a connection. The best case net effect is that these connections you have will be started and stopped so often because you have different users on the site and making them persistent doesn't do any good.
The key thing to understand about persistent connections is that you should NOT use them in most web applications. They sound enticing but they are dangerous and pretty much useless.
I'm sure there are other threads on this but a persistent connection is dangerous because it persists between requests. If, for example, you lock a table during a request and then fail to unlock then that table is going to stay locked indefinitely. Persistent connections are also pretty much useless for 99% of your apps because you have no way of knowing if the same connection will be used between different requests. Each web thread will have it's own set of persistent connections and you have no way of controlling which thread will handle which requests.
The procedural mysql library of PHP, has a feature whereby subsequent calls to mysql_connect will return the same link, rather than open a different connection (As one might expect). This has nothing to do with persistent connections and is specific to the mysql library. PDO does not exhibit such behaviour
Resource Link : link
In General you could use this as a rough "ruleset"::
YES, use persistent connections, if:
There are only few applications/users accessing the database, i.e.
you will not result in 200 open (but probably idle) connections,
because there are 200 different users shared on the same host.
The database is running on another server that you are accessing over
the network
An (one) application accesses the database very often
NO, don't use persistent connections, if:
Your application only needs to access the database 100 times an hour.
You have many, many webservers accessing one database server
Using persistent connections is considerable faster, especially if you are accessing the database over a network. It doesn't make so much difference if the database is running on the same machine, but it is still a little bit faster. However - as the name says - the connection is persistent, i.e. it stays open, even if it is not used.
The problem with that is, that in "default configuration", MySQL only allows 1000 parallel "open channels". After that, new connections are refused (You can tweak this setting). So if you have - say - 20 Webservers with each 100 Clients on them, and every one of them has just one page access per hour, simple math will show you that you'll need 2000 parallel connections to the database. That won't work.
Ergo: Only use it for applications with lots of requests.

On my tests I had a connection time of over a second to my localhost, thus assuming I should use a persistent connection. Further tests showed it was a problem with 'localhost':
Test results in seconds (measured by php microtime):
hosted web: connectDB: 0.0038912296295166
localhost: connectDB: 1.0214691162109 (over one second: do not use localhost!)
127.0.0.1: connectDB: 0.00097203254699707
Interestingly: The following code is just as fast as using 127.0.0.1:
$host = gethostbyname('localhost');
// echo "<p>$host</p>";
$db = new PDO("mysql:host=$host;dbname=" . DATABASE . ';charset=utf8', $username, $password,
array(PDO::ATTR_EMULATE_PREPARES => false,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION));

Persistent connections should give a sizable performance boost. I disagree with the assement that you should "Avoid" persistence..
It sounds like the complaints above are driven by someone using MyIASM tables and hacking in their own versions of transactions by grabbing table locks.. Well of course you're going to deadlock! Use PDO's beginTransaction() and move your tables over to InnoDB..

seems to me having a persistent connection would eat up more system resources. Maybe a trivial amount, but still...

The explanation for using persistent connections is obviously reducing quantity of connects that are rather costly, despite the fact that they're considerably faster with MySQL compared to other databases.
The very first trouble with persistent connections...
If you are creating 1000's of connections per second you normally don't ensure that it stays open for very long time, but Operation System does. Based on TCP/IP protocol Ports can’t be recycled instantly and also have to invest a while in “FIN” stage waiting before they may be recycled.
The 2nd problem... using a lot of MySQL server connections.
Many people simply don't realize you are able to increase *max_connections* variable and obtain over 100 concurrent connections with MySQL others were beaten by older Linux problems of the inability to convey more than 1024 connections with MySQL.
Allows talk now about why Persistent connections were disabled in mysqli extension. Despite the fact that you can misuse persistent connections and obtain poor performance which was not the main reason. The actual reason is – you can get a lot more issues with it.
Persistent connections were put into PHP throughout occasions of MySQL 3.22/3.23 when MySQL was not so difficult which means you could recycle connections easily with no problems. In later versions quantity of problems however came about – Should you recycle connection that has uncommitted transactions you take into trouble. If you recycle connections with custom character set configurations you’re in danger again, as well as about possibly transformed per session variables.
One trouble with using persistent connections is it does not really scale that well. For those who have 5000 people connected, you'll need 5000 persistent connections. For away the requirement for persistence, you may have the ability to serve 10000 people with similar quantity of connections because they are in a position to share individuals connections when they are not with them.

I was just wondering whether a partial solution would be to have a pool of use-once connections. You could spend time creating a connection pool when the system is at low usage, up to a limit, hand them out and kill them when either they've completed or timed out. In the background you're creating new connections as they're being taken. At worst case this should only be as slow as creating the connection without the pool, assuming that establishing the link is the limiting factor?

How server manage different user's requests at a time?

can you tell me how server handles different http request at a time. If 10 users logged in a site and send request for a page at the same time what will happen?

Usually, each of the users sends a HTTP request for the page. The server receives the requests and delegates them to different workers (processes or threads).
Depending on the URL given, the server reads a file and sends it back to the user. If the file is a dynamic file such as a PHP file, the file is executed before it's send back to the user.
Once the requested file has been sent back, the server usually closes the connection after a few seconds.
For more, see: HowStuffWorks Web Servers

HTTP uses TCP which is a connection-based protocol. That is, clients establish a TCP connection while they're communicating with the server.
Multiple clients are allowed to connect to the same destination port on the same destination machine at the same time. The server just opens up multiple simultaneous connections.
Apache (and most other HTTP servers) have a multi-processing module (MPM). This is responsible for allocating Apache threads/processes to handle connections. These processes or threads can then run in parallel on their own connection, without blocking each other. Apache's MPM also tends to keep open "spare" threads or processes even when no connections are open, which helps speed up subsequent requests.
The program ab (short for ApacheBench) which comes with Apache lets you test what happens when you open up multiple connections to your HTTP server at once.
Apache's configuration files will normally set a limit for the number of simultaneous connections it will accept. This will be set to a reasonable number, such that during normal operation this limit should never be reached.
Note too that the HTTP protocol (from version 1.1) allows for a connection to be kept open, so that the client can make multiple HTTP requests before closing the connection, potentially reducing the number of simultaneous connections they need to make.
More on Apache's MPMs:
Apache itself can use a number of different multi-processing modules (MPMs). Apache 1.x normally used a module called "prefork", which creates a number of Apache processes in advance, so that incoming connections can often be sent to an existing process. This is as I described above.
Apache 2.x normally uses an MPM called "worker", which uses multithreading (running multiple execution threads within a single process) to achieve the same thing. The advantage of multithreading over separate processes is that threading is a lot more light-weight compared to opening separate processes, and may even use a bit less memory. It's very fast.
The disadvantage of multithreading is you can't run things like mod_php. When you're multithreading, all your add-in libraries need to be "thread-safe" - that is, they need to be aware of running in a multithreaded environment. It's harder to write a multi-threaded application. Because threads within a process share some memory/resources between them, this can easily create race condition bugs where threads read or write to memory when another thread is in the process of writing to it. Getting around this requires techniques such as locking. Many of PHP's built-in libraries are not thread-safe, so those wishing to use mod_php cannot use Apache's "worker" MPM.

Apache 2 has two different modes of operation. One is running as a threaded server the other is using a mode called "prefork" (multiple processes).

The requests will be processed simultaneously, to the best ability of the HTTP daemon.
Typically, the HTTP daemon will spawn either several processes or several threads and each one will handle one client request. The server may keep spare threads/processes so that when a client makes a request, it doesn't have to wait for the thread/process to be created. Each thread/process may be mapped to a different processor or core so that they can be processed more quickly. In most circumstances, however, what holds the requests is network I/O, not lack of raw computing, so there is frequently no slowdown from having a number of processors/cores significantly lower than the number of requests handled at one time.

The server (apache) is multi-threaded, meaning it can run multiple programs at once. A few years ago, a single CPU could switch back and forth quickly between multiple threads, giving on the appearance that two things were happening at once. These days, computers have multiple processors, so the computer can actually run two threads of code simultaneously. That being said, threads aren't really mapped to processors in any simple way.
With that ability, a PHP program can be thought of as a single thread of execution. If two requests reach the server at the same time, two threads can be used to process the request simultaneously. They will probably both get about the same amount of CPU, so if they are doing the same thing, they will complete at approximately the same time.
One of the most common issues with multi-threading is "race conditions"-- where you two requests are doing the same thing ("racing" to do the same thing), if it is a single resource, one of them is going to win. If they both insert a record into the database, they can't both get the same id-- one of them will win. So you need to be careful when writing code to realize other requests are going on at the same time and may modify your database, write files or change globals.
That being said, the programming model allows you to mostly ignore this complexity.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.