What are the advantages of using prepared statements with MySQLi?
If the only purpose is to secure the query, isn't it better to clean the query using something like mysqli_real_escape_string instead of writing so many lines of code for each query (like prepare, bind_param, execute, close, etc.)?
Preparing statements is not just for code security. It helps the SQL server parse your statement and generate an execution plan for your query.
If you run a SELECT 1000 times then the SQL server will have to parse, prepare, and generate a plan on how to get your data 1000 times.
If you prepare a statement then run the prepared statement 1,000 times each with different bound values then the statement is parsed only once and the same query plan is used each time.
This helps not only when you run 1,000 queries inside the script but also if you only have 1 statement and run the script 1,000 times. The server can remember the plan server side. The next time your script runs it will use the same plan again.
Sure it may seem trivial for one or two queries but when you start executing large numbers of queries or the same query repeatedly you will save a lot of processing time.
There are several advantages:
Security - you don't need to escape anything, you just need to bind the parameters.
Correctness - If you write WHERE $x = 4 you will get a syntax error if $x is null, but WHERE ? = 4 will work.
Performance - prepared queries can be reused with different parameters, saving rebuilding the string, and also reparsing on the server.
Maintainability - the code is much easier to read. When concatenating strings to create the SQL it's easy to end up with lots of small pieces of SQL and PHP code intermingled. Using prepared statements encourages you to separate the SQL from determining the values of the variables.
Related
I cringed when Sebastien stated he was disconnecting & reconnecting between each use of mysqli_multi_query() # Can mysqli_multi_query do UPDATE statements? because it just didn't seem like best practice.
However, Craig # mysqli multi_query followed by query stated in his case that it was faster to disconnect & reconnect between each use of mysqli_multi_query() than to employ mysqli_next_result().
I would like to ask if anyone has further first-hand knowledge or benchmark evidence to suggest an approximate "cutoff" (based on query volume or something) when a programmer should choose the "new connection" versus "next result" method.
I am also happy to hear any/all concerns not pertaining to speed. Does Craig's use of a connecting function have any bearing on speed?
Is there a speed difference between Craig's while statement:
while ($mysqli->next_result()) {;}
- versus -
a while statement that I'm suggesting:
while(mysqli_more_results($mysqli) && mysqli_next_result($mysqli));
- versus -
creating a new connection for each expected multi_query, before running first multi_query. I just tested this, and the two mysqli_multi_query()s were error free = no close() needed:
$mysqli1=mysqli_connect("$host","$user","$pass","$db");
$mysqli2=mysqli_connect("$host","$user","$pass","$db");
- versus -
Opening and closing between each mysqli_multi_query() like Sebastien and Craig:
$mysqli = newSQL();
$mysqli->multi_query($multiUpdates);
$mysqli->close();
- versus -
Anyone have another option to test against?
It is not next_result() to blame but queries themselves. The time your code takes to run relies on the time actual queries take to perform.
Although mysqli_multi_query() returns control quite fast, it doesn't mean that all queries got executed by that time. Quite contrary, by the time mysqli_multi_query() finished, only first query got executed. While all other queries are queued on the mysql side for the asynchronous execution.
From this you may conclude that next_result() call doesn't add any timeout by itself - it's just waiting for the next query to finish. And if query itself takes time, then next_result() have to wait as well.
Knowing that you already may tell which way to choose: if you don't care for the results, you may just close the connection. But in fact, it'll be just sweeping dirt under the rug, leaving all the slow queries in place. So, it's better to keep next_result() loop in place (especially because you have to check for errors/affected rows/etc. anyway) but speed up the queries themselves.
So, it turns out that to solve the problem with next_result() you have to actually solve the regular problem of the query speed. So, here are some recommendations:
For the select queries it's usual indexing/explain analyze, already explained in other answers.
For the DML queries, especially run in batches, there are other ways:
Speaking of Craig's case, it's quite much resembling the known problem of speed of innodb writes. By default, innodb engine is set up into very cautious mode, where no following write is performed until engine ensured that previous one were finished successfully. So, it makes writes awfully slow (something like only 10 queries/sec). The common workaround for this is to make all the writes at once. For insert queries there are plenty of methods:
you can use multiple values insert syntax
you can use LOAD DATA INFILE query
you can wrap all the queries in a transaction.
While for updating and deleting only transaction remains reliable way. So, as a universal solution such a workaround can be offered
$multiSQL = "BEGIN;{$multiSQL}COMMIT;";
$mysqli->multi_query($multiSQL);
while ($mysqli->next_result()) {/* check results here */}
If it doesn't work/inapplicable in your case, then I'd suggest to change mysqli_multi_query() for the single queries run in a loop, investigate and optimize the speed and then return to multi_query.
To answer your question:
look before you jump
I expect your mysqli_more_results() call (the look before you jump), doesn't speed up things: If you have n results, you'll do (2*n)-1 calls to the database, whereas Craig does n+1.
multiple connections
multi_query executes async, so you'll just be adding connection overhead.
opening and closing db
Listen to Your Common Sense ;-) But don't loose track of what you're doing. Wrapping queries in a transaction, will make them atomic. That means, they all fail, or they all succeed. Sometimes that is required to make the database never conflict with your universe of discourse. But using transactions for speedups, may have unwanted side-effects. Consider the case where one of your queries violates a constraint. That will make the whole transaction fail. Meaning that if they weren't a logical transaction in the first place and most queries should have succeeded, that you'll have to find out which went wrong and which will have to be reissued. Costing you more instead of delivering a speedup.
Sebastien's queries actually look like they should be part of some bigger transaction, that contains the deletion or updates of the parents.
Instead, try and remember
there is no spoon
In your examples, there was no need for multiple queries. The INSERT ... VALUES form takes multiple tuples for VALUES. So instead of preparing one prepared statement and wrap its repeated executions in a transaction like Your Common Sense suggest. You could prepare a single statement and have it executed and auto-committed. As per mysqli manual this saves you a bunch of roundtrips.
So make a SQL statement of the form:
INSERT INTO r (a, b, c) VALUES (?, ?, ?), (?, ?, ?), ...
and bind and execute it. mysqldump --opt does it, so why don't we? The mysql reference manual as a section on statement optimization. Look in its DML section for insert and update queries. But understanding why --opt does what it does is a good start.
the underestimated value of preparing a statement
To me, the real value of prepared statements is not that you can execute them multiple times, but the automatic input escaping. For a measly single extra client-server round-trip, you save yourself from SQL injection. SQL injection is a serious point of attention especially when you're using multi_query. multi_query tells mysql to expect multiple queries and execute them. So fail to escape properly and you're in for some fun:
So my best practise would be:
Do I really need multiple queries?
If I do, escape them well, or prepare them!
How to take advantage of prepared statements for performance? I understand that something like this might benefit if I put it in a loop:
SELECT `Name` FROM `Hobbits` WHERE `ID` = :ID;
I've read that looping with prepared statements is faster than looping without, but otherwise prepared statements would slightly decrease performance. So - how big may that loop be?
If I run a complex SQL query at the beginning of my code and repeat it with one different parameter at the end - will the second query run faster? (We are using a single connection for each page load). Is there a limit on cached queries, so I better repeat my queries right away?
What about executing the entire script twice with the exact same parameters (reload the page or 2 users)?
A prepared query is given to the SQL server, which parses it and possibly already prepares an execution plan. You're then basically given an id for these allocated resources and can execute this prepared statement by just filling in the blanks in the statement. You can run this statement as often as you like and the database will not have to repeat the parsing and execution planning, which may bring a speed improvement.
As long as you do not throw away the statement, there's no hard timeout for how long the statement will "stay prepared". It's not a cache, it's an allocated resource on the SQL server. At least as long as your database driver uses native prepared statements in the SQL API. PDO for example does not do so by default, unless you set PDO::ATTR_EMULATE_PREPARES to false.
At the end of the script execution though, all those resources will always be deallocated, they do not persist across different page loads. Beyond that, the SQL server may or may not cache the query and its results for some time regardless of the client script.
How long are prepared mysql queries cached?
This is not actually "a cache". Prepared statement lasts as little as during script execution.
If I run a complex SQL query at the beginning of my code and repeat it with one different parameter at the end - will the second query run faster?
The more complex a query, the less effect you will see. Frankly, prepared statement saves you only parsing, while if execution involves temporary or filesort, or table scan - prepared statement would speed up none of them.
On the other hand, for the simple primary-key lookups, which involve no complex query parsing nor building sophisticated query plans, the benefit would be negligible to none.
So - how big may that loop be?
The more iterations it gets - the more benefit. However, in a sane web-application one have to avoid looping queries at all.
What about executing the entire script twice with the exact same parameters (reload the page or 2 users)?
As I said above, there will be no benefit from a prepared statement at all. A classical query cache, however, most likely would fire.
How to take advantage of prepared statements for performance?
Noway. Not in web-serving PHP, at least. In a some long-running cli-based script - may be.
However, prepared statements ought to be used anyway, for the purpose of producing syntactically correct queries.
FYI, from
http://dev.mysql.com/doc/refman/5.7/en/statement-caching.html
The max_prepared_stmt_count system variable controls the total number of statements the server caches. (The sum of the number of prepared statements across all sessions.)
Ive recently upgraded my mind from mysql_* to PDO, and I have one simple question:
Is PDO really that much more efficient that the use of a prepared statement and an execute in a for-each loop is quicker than a single call in mysql with multiple values in it?
For example if I have an array of 5 names, putting these in an execute command in a for loop operating on an 'insert' prepared statement - is calling this 5 times going to be quicker in computational speed that one call using the old mysql with all 5 values in a single query? Or is it preferred due to security rather than speed alone?
The meaning and significance of native prepared statements (which you call "PDO") is overlooked and misjudged by everyone.
The speed benefit, everyone talking about so much, in reality can be achieved extremely rare, and often unnoticeable at all. Especially in the area of web-development with PHP which PDO belongs to.
Also note that whatever speed benefit belongs to the query parsing only - no such matters like index rebuilding or time required to find a record to update ever affected by prepared statements.
So, speaking of numbers like five, don't bother yourself with this "once-prepare-multiple execute" thing. It is not what PDO is about. PDO does two essential things, which makes it preferred over two other possible extensions:
it supports prepared statements in general, allowing data in the query not directly but via placeholder. This is the only reason why you should use PDO or similar lib (although you can easily make even old mysql ext to support prepared statements, but PDO offers it out of the box)
it makes such support not as painful as mysqli
Turning back to your question:
You can use either way you like. Just remember that multiple inserts are better to be wrapped in a transaction, due to default settings of the modern DB engines
No matter which way you choose, any dynamical value should be added into query via placeholders only. If you still not convinced, you are welcome to read an article I wrote on the matter (which is still incomplete, but have a through explanation on the real meaning of prepared statements).
PS. There is also one minor benefit of native prepared statements, often forgotten (becaulse seldom demanded) - if native prepared statement were used (and backed by msqlnd driver), the data returned is already formatted according to its type.
One query that fetches 5 rows will probably be quicker than 5 separate calls, so you are comparing apples and oranges.
When executing the same query, the performance will be similar too. The (small) performance advantage that PDO has, is that queries with parameters are supposed to be better cachable. When querying customer 3 and customer 5, the query will be cached as two different queries, while only the id is different. By using parameters, the database might cache the query in a smarter way, so a second call with a different input doesn't need to go through the query optimizer and such.
That said, apart from the performance advantage, PDO is also safer (when actually using paramteres), and in the end easier. It may look more complex at first, but it is easier to do right, because without using parameters, you will need to do all the escaping yourself, risking dangerous bugs.
By the way, you can also build a query with a variable number of parameters, and bind a value to each of them in a loop, so with PDO you could still perform the single insert query for 5 rows, although it will need a bit puzzling and a bit of extra code.
If I want to execute the same query on two different requests, and I use prepared statements with Doctrine2... Will the prepared statement be sent only the first time and be stored by the database for some time? Or will it be removed after each script finishes?
On PostgreSQL a prepared statement is valid only till the end of the session and is not saved in the memory and shared between many sessions, see doc:
http://www.postgresql.org/docs/9.2/static/sql-prepare.html
Prepared statements only last for the duration of the current database
session. When the session ends, the prepared statement is forgotten,
so it must be recreated before being used again. This also means that
a single prepared statement cannot be used by multiple simultaneous
database clients; however, each client can create their own prepared
statement to use.
However, they also say, that PostgreSQL may (but not need to) save a plan for this query in memory for future reusing:
If a prepared statement is executed enough times, the server may
eventually decide to save and re-use a generic plan rather than
re-planning each time. This will occur immediately if the prepared
statement has no parameters; otherwise it occurs only if the generic
plan appears to be not much more expensive than a plan that depends on
specific parameter values. Typically, a generic plan will be selected
only if the query's performance is estimated to be fairly insensitive
to the specific parameter values supplied.
To examine the query plan PostgreSQL is using for a prepared statement, use EXPLAIN. If a generic plan is in use, it will contain
parameter symbols $n, while a custom plan will have the current actual
parameter values substituted into it.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Are prepared statements cached server-side across multiple page loads with PHP?
I'm working on a new project and using parameterized queries for the first time (PHP with a MySQL DB). I read that they parameterized queries are cached, but I'm wondering how long they are cached for. For example, let's say I have a function 'getAllUsers()' that gets a list of all active user ID's from the user table and for each ID, a User object is created and a call to function 'getUser($user)' is made to set the other properties of the object. The 'getUser()' function has it's own prepared query with a stmt->close() at the end of the function.
If I do it this way, does my parameterized query in 'getUser()' take advantage of caching at all or is the query destroyed from cache after each stmt->close()?
Note: I also use the getUser() function if a page only requires data for a single user object so I wanted to do it this way to ensure that if the user table changes I only ever need to update one query.
Is this the right way of doing something like this or is there a better way?
Update: Interesting, just saw this on php.net's manual for prepared statements (http://php.net/manual/en/mysqli.quickstart.prepared-statements.php)
Using a prepared statement is not always the most efficient way of executing a statement. A prepared statement executed only once causes more client-server round-trips than a non-prepared statement.
So I guess the main benefit for parameterized queries is to protect against SQL injection and not necessarily to speed things up unless it's a query that will repeated at one time.
Calling mysqli_stmt::close will:
Closes a prepared statement. mysqli_stmt_close() also deallocates the
statement handle.
therefore not being able to use the cached version of the statement for further executions.
I wouldn't mind of freeing resources or closing statements since PHP will do it for you at the end of the script anyway.
Also if you are working with loops (as you described) take a look at mysqli_stmt::reset which will reset the prepared statement to its original state (after the prepare call).
That's good question, from some point of view.
First, about "caching".
There is some special thing about prepared queries - you can send it to server once and then execute it multiple times. It can give some small theoretical benefit for using already parsed and prepared query.
As it seems, you're not using such mechanism, every time preparing every your query. So, there is no caching at all.
Next, about premature optimization.
You've heard of some caching, and it occupied your imagination.
While there is no real need or cause for you to concern about caching or whatever performance issue.
So, there is a rule: do not occupy yourself with performance issues until they are real.
Otherwise you'll waste your time.