I have many questions about PDO ...
Should I use prepare() only when I have parameters to bind? When I need to do a simple query like select * from table order by ... should i use query()?
Should I use exec() when I have update and delete operations and need to get the number of rows affected, or should I use PDOStatement->rowCount() instead?
Should I use closeCursor when I do insert, update and delete, or only with select when I need to do another select?
Does $con = NULL; really close the connection?
Is using bindParam with foreach to make multiple inserts a good point? I mean performance wise, because I think that doing (...),(...) on the same insert is better isn't it?
Can you provide me some more information (URL) about performance points when using PHP PDO MySQL? If someone has another hint it would be really useful.
When I was developing the DB layer in Zend Framework 1.0, I made it use prepare/execute for all queries by default. There is little downside to doing this.* There's a little bit of overhead on the PHP side, but on the MySQL side, prepared queries are actually faster.
My practice is to use query() for all types of queries, and call rowCount() after updates. You can also call SELECTROW_COUNT().
CloseCursor is useful in MySQL if you have pending rows from a result set, or pending result sets in a multi-result set query. It's not necessary when you use INSERT, UPDATE, DELETE.
The PDO_mysql test suite closes connections with $con=NULL and that is the proper way. This won't actually close persistent connections managed by libmysqlnd, but that's deliberate.
Executing a prepared INSERT statement one row at a time is not as fast as executing a single INSERT with multiple tuples. But the difference is pretty small. If you have a large number of rows to insert, and performance is important, you should really use LOAD DATA LOCAL INFILE. See also http://dev.mysql.com/doc/refman/5.6/en/insert-speed.html for other tips.
You can google for "PDO MySQL benchmark" (for example) to find various results. The bottom line, however, is that choosing PDO vs. Mysqli has no clear winner. The difference is slight enough that it diminishes relative to other more important optimization techniques, such as choosing the right indexes, making sure indexes fit in RAM, and clever use of application-side caching.
* Some statements cannot run as prepared statements in MySQL, but the list of such statements gets smaller with each major release. If you're still using an ancient version of MySQL that can't run certain statements with prepare(), then you should have upgraded years ago!
Re your comment:
Yes, using query parameters (e.g. with bindValue() and bindParam()) is considered the best methods for defending against SQL injections in most cases.
Note that there's an easier way to use query parameters with PDO -- you can just pass an array to execute() so you don't have to bother with bindValue() or bindParam():
$sql = "SELECT * FROM MyTable WHERE name = ?";
$stmt = $pdo->prepare($sql);
$stmt->execute( array("Bill") );
You can also use named parameters this way:
$sql = "SELECT * FROM MyTable WHERE name = :name";
$stmt = $pdo->prepare($sql);
$stmt->execute( array(":name" => "Bill") );
Using quote() and then interpolating the result into a query is also a good way to protect against SQL injection, but IMHO makes code harder to read and maintain, because you're always trying to figure out if you have closed your quotes and put your dots in the right place. It's much easier to use parameter placeholders and then pass parameters.
You can read more about SQL injection defense in my presentation, SQL Injection Myths and Fallacies.
Most of questions can be answered with just common sense. So, here I am.
It doesn't matter actually.
Abolutely and definitely - NO. Exec doesn't utilize prepared statements. That's all.
Doesn't really matter. If you ever need this, your program architecture is probably wrong.
You can easily test it yourself. A personal experience is always preferred.
The difference considered to be negligible. However, if your multiple inserts being really slow (on INNODB with default settings for example) you have to use a transaction, which will make them fast again.
There is NONE. PDO is just an API. But APIs aren't related to performance. They just translate your commands to the service. It's either your commands or service may affect performance, but not mere API.
So, the rule of thumb is:
it's query itself that affects performance, not the way you are running it.
Related
I have a PHP class that processes data and stores it in a MySQL database. I use prepared statements via PDO for security reasons when data is saved, but because the class is large these prepared statements are created inside different functions that are called thousands of times during the lifetime of the object (anywhere from one minute to thirty).
What I’m wondering is if there’s any reason I couldn't prepare the statements in the class constructor and save the handles in member variables to avoid the statements being prepared more than once.
Is there any reason this wouldn't work? I don’t see why not, but I've never seen it done before, which makes me wonder if doing this is a bad practice for some reason.
I.E. something like this:
Class MyClass {
private stmt1;
function __construct($dbh) {
$this->stmt1 = $dbh->prepare('SELECT foo FROM bar WHERE foobar = :foobar');
}
private function doFoo() {
$this->stmt1->execute(...)
...
}
}
I use prepared statements via PDO for security reasons when data is saved, but because the class is large these prepared statements are created inside different functions that are called thousands of times during the lifetime of the object (anywhere from one minute to thirty).
Whenever I look at bounty questions I always ask myself, "Are they even solving the correct problem?" Is executing the same query with different parameters thousands of times during the lifetime of this object really the best way to go?
If you are doing multiple SELECTs then maybe a better query that fetches more information at once would be better.
If you are doing multiple INSERTs then maybe batch inserts would serve you better.
If after evaluating the above options you decide that you still need to call these statements thousands of times during the life of the object then yes, you can cache the result of a prepared statement:
Measure current performance.
Turn off emulated prepares.
Measure the performance impact.
Use a technique called memoization or lazy loading to cache the prepare but only prepare a query when it is actually used.
Measure the performance impact again.
This allows you to see the impact of each piece that you changed. I would suspect that if you are really calling these queries thousands of times then some or all of these changes will help you but you must measure before and after to measure to know.
Storing the statements as variables works on paper. Be wary about performance though.
In particular, there's a world of difference between real prepares (which are off by default for MySQL) or emulated prepares (default for MySQL, using PDO::ATTR_EMULATE_PREPARES).
An emulated prepared statement will parse the query locally. Upon getting executed, they'll replace the parameters by their value and ship the final SQL string to the client. Upon receiving it, the database will parse the query, come up with a query plan, execute it, and return rows.
A real prepared statement will ship the query to be prepared straight to the database. The latter will parse it, prepare a generic query plan based on the query and the unknown variables, and return a prepared statement for use by PHP. When PDO executes the statement, it ships the prepared statement back along with the parameters. The database then executes the prepared query plan and returns rows.
As you may have noted, a real prepared statement involves a lot of back and forth between PHP and the DB. This is offset by the fact that the query is planned once and for all. Sometimes this is desirable (a similar query is used many times); sometimes not (the query is used a single time).
A further caveat is that a real prepared statement's query plan may or may not be the best possible one owing to the variables involved. Suppose an b-tree index on foo (bar):
select bar from foo order by bar limit ?
If the variable is small, an index scan is desirable; if it's larger, a bitmap index scan makes sense if available; if it's huge, a seq scan becomes desirable. In the latter two cases, the planner will also need to pick a sorting method. But since the query planner is tasked with coming up with a plan, Murphy's law states that it'll occasionally pick the worst possible plan for your particular use case. And the next thing you know, you'll end up scanning the sorting the entire table to retrieve a couple of rows, or following the index on bar to retrieve the entire table.
Lastly, and as an aside, you might want to look into ORMs if you're not familiar with them already.
Technically it is possible, as you already know by simply trying or just reading:
The query […] can be executed multiple times.
I would consider preparing all statements in the constructor as a bad idea. I guess it will become unmaintainable if you got a bunch of SQL statements in the constructor without any context. Furthermore you might prepare more than you actually need.
One idea to overcome this is using a statement map:
private $statments = array();
public function getStatement($sql)
{
if (! isset($this->statements[$sql])) {
$this->statements[$sql] = $this->pdo->prepare($sql);
}
return $this->statements[$sql];
}
This will prepare statements only once and you got your SQL context in the right place.
But I would call this a premature optimization because your DBS' query cache is most likely doing this for you.
I cringed when Sebastien stated he was disconnecting & reconnecting between each use of mysqli_multi_query() # Can mysqli_multi_query do UPDATE statements? because it just didn't seem like best practice.
However, Craig # mysqli multi_query followed by query stated in his case that it was faster to disconnect & reconnect between each use of mysqli_multi_query() than to employ mysqli_next_result().
I would like to ask if anyone has further first-hand knowledge or benchmark evidence to suggest an approximate "cutoff" (based on query volume or something) when a programmer should choose the "new connection" versus "next result" method.
I am also happy to hear any/all concerns not pertaining to speed. Does Craig's use of a connecting function have any bearing on speed?
Is there a speed difference between Craig's while statement:
while ($mysqli->next_result()) {;}
- versus -
a while statement that I'm suggesting:
while(mysqli_more_results($mysqli) && mysqli_next_result($mysqli));
- versus -
creating a new connection for each expected multi_query, before running first multi_query. I just tested this, and the two mysqli_multi_query()s were error free = no close() needed:
$mysqli1=mysqli_connect("$host","$user","$pass","$db");
$mysqli2=mysqli_connect("$host","$user","$pass","$db");
- versus -
Opening and closing between each mysqli_multi_query() like Sebastien and Craig:
$mysqli = newSQL();
$mysqli->multi_query($multiUpdates);
$mysqli->close();
- versus -
Anyone have another option to test against?
It is not next_result() to blame but queries themselves. The time your code takes to run relies on the time actual queries take to perform.
Although mysqli_multi_query() returns control quite fast, it doesn't mean that all queries got executed by that time. Quite contrary, by the time mysqli_multi_query() finished, only first query got executed. While all other queries are queued on the mysql side for the asynchronous execution.
From this you may conclude that next_result() call doesn't add any timeout by itself - it's just waiting for the next query to finish. And if query itself takes time, then next_result() have to wait as well.
Knowing that you already may tell which way to choose: if you don't care for the results, you may just close the connection. But in fact, it'll be just sweeping dirt under the rug, leaving all the slow queries in place. So, it's better to keep next_result() loop in place (especially because you have to check for errors/affected rows/etc. anyway) but speed up the queries themselves.
So, it turns out that to solve the problem with next_result() you have to actually solve the regular problem of the query speed. So, here are some recommendations:
For the select queries it's usual indexing/explain analyze, already explained in other answers.
For the DML queries, especially run in batches, there are other ways:
Speaking of Craig's case, it's quite much resembling the known problem of speed of innodb writes. By default, innodb engine is set up into very cautious mode, where no following write is performed until engine ensured that previous one were finished successfully. So, it makes writes awfully slow (something like only 10 queries/sec). The common workaround for this is to make all the writes at once. For insert queries there are plenty of methods:
you can use multiple values insert syntax
you can use LOAD DATA INFILE query
you can wrap all the queries in a transaction.
While for updating and deleting only transaction remains reliable way. So, as a universal solution such a workaround can be offered
$multiSQL = "BEGIN;{$multiSQL}COMMIT;";
$mysqli->multi_query($multiSQL);
while ($mysqli->next_result()) {/* check results here */}
If it doesn't work/inapplicable in your case, then I'd suggest to change mysqli_multi_query() for the single queries run in a loop, investigate and optimize the speed and then return to multi_query.
To answer your question:
look before you jump
I expect your mysqli_more_results() call (the look before you jump), doesn't speed up things: If you have n results, you'll do (2*n)-1 calls to the database, whereas Craig does n+1.
multiple connections
multi_query executes async, so you'll just be adding connection overhead.
opening and closing db
Listen to Your Common Sense ;-) But don't loose track of what you're doing. Wrapping queries in a transaction, will make them atomic. That means, they all fail, or they all succeed. Sometimes that is required to make the database never conflict with your universe of discourse. But using transactions for speedups, may have unwanted side-effects. Consider the case where one of your queries violates a constraint. That will make the whole transaction fail. Meaning that if they weren't a logical transaction in the first place and most queries should have succeeded, that you'll have to find out which went wrong and which will have to be reissued. Costing you more instead of delivering a speedup.
Sebastien's queries actually look like they should be part of some bigger transaction, that contains the deletion or updates of the parents.
Instead, try and remember
there is no spoon
In your examples, there was no need for multiple queries. The INSERT ... VALUES form takes multiple tuples for VALUES. So instead of preparing one prepared statement and wrap its repeated executions in a transaction like Your Common Sense suggest. You could prepare a single statement and have it executed and auto-committed. As per mysqli manual this saves you a bunch of roundtrips.
So make a SQL statement of the form:
INSERT INTO r (a, b, c) VALUES (?, ?, ?), (?, ?, ?), ...
and bind and execute it. mysqldump --opt does it, so why don't we? The mysql reference manual as a section on statement optimization. Look in its DML section for insert and update queries. But understanding why --opt does what it does is a good start.
the underestimated value of preparing a statement
To me, the real value of prepared statements is not that you can execute them multiple times, but the automatic input escaping. For a measly single extra client-server round-trip, you save yourself from SQL injection. SQL injection is a serious point of attention especially when you're using multi_query. multi_query tells mysql to expect multiple queries and execute them. So fail to escape properly and you're in for some fun:
So my best practise would be:
Do I really need multiple queries?
If I do, escape them well, or prepare them!
Ive recently upgraded my mind from mysql_* to PDO, and I have one simple question:
Is PDO really that much more efficient that the use of a prepared statement and an execute in a for-each loop is quicker than a single call in mysql with multiple values in it?
For example if I have an array of 5 names, putting these in an execute command in a for loop operating on an 'insert' prepared statement - is calling this 5 times going to be quicker in computational speed that one call using the old mysql with all 5 values in a single query? Or is it preferred due to security rather than speed alone?
The meaning and significance of native prepared statements (which you call "PDO") is overlooked and misjudged by everyone.
The speed benefit, everyone talking about so much, in reality can be achieved extremely rare, and often unnoticeable at all. Especially in the area of web-development with PHP which PDO belongs to.
Also note that whatever speed benefit belongs to the query parsing only - no such matters like index rebuilding or time required to find a record to update ever affected by prepared statements.
So, speaking of numbers like five, don't bother yourself with this "once-prepare-multiple execute" thing. It is not what PDO is about. PDO does two essential things, which makes it preferred over two other possible extensions:
it supports prepared statements in general, allowing data in the query not directly but via placeholder. This is the only reason why you should use PDO or similar lib (although you can easily make even old mysql ext to support prepared statements, but PDO offers it out of the box)
it makes such support not as painful as mysqli
Turning back to your question:
You can use either way you like. Just remember that multiple inserts are better to be wrapped in a transaction, due to default settings of the modern DB engines
No matter which way you choose, any dynamical value should be added into query via placeholders only. If you still not convinced, you are welcome to read an article I wrote on the matter (which is still incomplete, but have a through explanation on the real meaning of prepared statements).
PS. There is also one minor benefit of native prepared statements, often forgotten (becaulse seldom demanded) - if native prepared statement were used (and backed by msqlnd driver), the data returned is already formatted according to its type.
One query that fetches 5 rows will probably be quicker than 5 separate calls, so you are comparing apples and oranges.
When executing the same query, the performance will be similar too. The (small) performance advantage that PDO has, is that queries with parameters are supposed to be better cachable. When querying customer 3 and customer 5, the query will be cached as two different queries, while only the id is different. By using parameters, the database might cache the query in a smarter way, so a second call with a different input doesn't need to go through the query optimizer and such.
That said, apart from the performance advantage, PDO is also safer (when actually using paramteres), and in the end easier. It may look more complex at first, but it is easier to do right, because without using parameters, you will need to do all the escaping yourself, risking dangerous bugs.
By the way, you can also build a query with a variable number of parameters, and bind a value to each of them in a loop, so with PDO you could still perform the single insert query for 5 rows, although it will need a bit puzzling and a bit of extra code.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Are prepared statements cached server-side across multiple page loads with PHP?
I'm working on a new project and using parameterized queries for the first time (PHP with a MySQL DB). I read that they parameterized queries are cached, but I'm wondering how long they are cached for. For example, let's say I have a function 'getAllUsers()' that gets a list of all active user ID's from the user table and for each ID, a User object is created and a call to function 'getUser($user)' is made to set the other properties of the object. The 'getUser()' function has it's own prepared query with a stmt->close() at the end of the function.
If I do it this way, does my parameterized query in 'getUser()' take advantage of caching at all or is the query destroyed from cache after each stmt->close()?
Note: I also use the getUser() function if a page only requires data for a single user object so I wanted to do it this way to ensure that if the user table changes I only ever need to update one query.
Is this the right way of doing something like this or is there a better way?
Update: Interesting, just saw this on php.net's manual for prepared statements (http://php.net/manual/en/mysqli.quickstart.prepared-statements.php)
Using a prepared statement is not always the most efficient way of executing a statement. A prepared statement executed only once causes more client-server round-trips than a non-prepared statement.
So I guess the main benefit for parameterized queries is to protect against SQL injection and not necessarily to speed things up unless it's a query that will repeated at one time.
Calling mysqli_stmt::close will:
Closes a prepared statement. mysqli_stmt_close() also deallocates the
statement handle.
therefore not being able to use the cached version of the statement for further executions.
I wouldn't mind of freeing resources or closing statements since PHP will do it for you at the end of the script anyway.
Also if you are working with loops (as you described) take a look at mysqli_stmt::reset which will reset the prepared statement to its original state (after the prepare call).
That's good question, from some point of view.
First, about "caching".
There is some special thing about prepared queries - you can send it to server once and then execute it multiple times. It can give some small theoretical benefit for using already parsed and prepared query.
As it seems, you're not using such mechanism, every time preparing every your query. So, there is no caching at all.
Next, about premature optimization.
You've heard of some caching, and it occupied your imagination.
While there is no real need or cause for you to concern about caching or whatever performance issue.
So, there is a rule: do not occupy yourself with performance issues until they are real.
Otherwise you'll waste your time.
I am taking over a PHP app that uses MySQL PDO prepared statements for each time it runs an SQL statement.
I know that Preparing SQL can be more efficient when the you are about to do many iterations of the same statement.
$sth = $dbh->prepare('SELECT name, colour, calories
FROM fruit
WHERE calories < ? AND colour = ?');
$sth->execute(array(150, 'red'));
$red = $sth->fetchAll();
$sth->execute(array(175, 'yellow'));
$yellow = $sth->fetchAll();
However, the app I am taking over has an built a layer on top of PDO that calls a common "execute" function, and it appears that it prepares every single SQL query. For example:
$query = self::$DB->prepare($sql, array(PDO::ATTR_CURSOR => PDO::CURSOR_FWDONLY));
$query->execute($bindvars);
If the app does many hundreds or thousands of "INSERT INTO ...... ON DUPLICATE KEY UPDATE" SQL statements, does the $DB->prepare() step create a significant overhead if it is run every single time?
Many thanks, Jason.
From the documentation:
Calling PDO::prepare() and
PDOStatement::execute() for statements
that will be issued multiple times
with different parameter values
optimizes the performance of your
application by allowing the driver to
negotiate client and/or server side
caching of the query plan and meta
information...
I 'm not really making any revelations here, but the opposite of "optimizes the performance" would indeed be "overhead". As to whether it's significant or not, why don't you run a loop either way and measure? You can then decide for yourself with hard data to back up your decision.
Besides query reuse, the primary reason to use prepared statements in PDO is to perform placeholder binding.
$query = self::$DB->prepare($sql, array(PDO::ATTR_CURSOR => PDO::CURSOR_FWDONLY));
$query->execute($bindvars);
In this code, the question marks (or :named) placeholders present in the $sql variable are replaced with the values in the $bindvars array. This replacement ensures that the variables are properly quoted and escaped, making it much more difficult to perform SQL injection.
There may be a small amount of overhead in the prepare/execute, but that small overhead is nothing given the risk of SQL injection. The only other option is concatenating together the SQL string, and that can be a huge security risk unless it's done perfectly every time.
The previous developer knew what he or she was doing, at least in this specific case, and you should not undo the work he or she did here. Quite the opposite, you should continue using prepared statements in all of your future code.
(On the other hand, I can't vouch for MySQL's cursor performance...)
If memory serves, MySQL sees your prepared statement and expects that you're probably running an application that is likely to call the same statement multiple times. As such, it caches the statement string, so preparing it again isn't much overhead, though it's more than just keeping the reference to the statement in memory. It's still definitely better than parsing a whole new query from a string each time.
This is just from my vague memory of what I think I've heard, though. Here's the important bit: if these hundreds or thousands of inserts are running in the same request, consider refactoring the database class to prepare once and execute many times in situations like these. The only way to know how much of a difference it will make is to benchmark it yourself :/