PHP PDO: how does re-preparing a statement affect performance - php

I'm writing a semi-simple database wrapper class and want to have a fetching method which would operate automagically: it should prepare each different statement only the first time around and just bind and execute the query on successive calls.
I guess the main question is: How does re-preparing the same MySql statement work, will PDO magically recognize the statement (so I don't have to) and cease the operation?
If not, I'm planning to achieve do this by generating a unique key for each different query and keep the prepared statements in a private array in the database object - under its unique key. I'm planning to obtain the array key in one of the following ways (none of which I like). In order of preference:
have the programmer pass an extra, always the same parameter when calling the method - something along the lines of basename(__FILE__, ".php") . __LINE__ (this method would work only if our method is called within a loop - which is the case most of the time this functionality is needed)
have the programmer pass a totally random string (most likely generated beforehand) as an extra parameter
use the passed query itself to generate the key - getting the hash of the query or something similar
achieve the same as the first bullet (above) by calling debug_backtrace
Has anyone similar experience? Although the system I'm working for does deserve some attention to optimization (it's quite large and growing by the week), perhaps I'm worrying about nothing and there is no performance benefit in doing what I'm doing?

MySQL (like most DBMS) will cache execution plans for prepared statements, so if user A creates a plan for:
SELECT * FROM some_table WHERE a_col=:v1 AND b_col=:v2
(where v1 and v2 are bind vars) then sends values to be interpolated by the DBMS, then user B sends the same query (but with different values for interpolation) the DBMS does not have to regenerate the plan. i.e. it's the DBMS which finds the matching plan - not PDO.
However this means that each operation on the database requires at least 2 round trips (1st to present the query, the second to present the bind vars) as opposed to a single round trip for a query with literal values, then this introduces additional network costs. There is also a small cost involved in dereferencing (and maintaining) the query/plan cache.
The key question is whether this cost is greater than the cost of generating the plan in the first place.
While (in my experience) there definitely seems to be a performance benefit using prepared statements with Oracle, I'm not convinced that the same is true for MySQL - however, a lot will depend on the structure of your database and the complexity of the query (or more specifically, how many different options the optimizer can find for resolving the query).
Try measuring it yourself (hint: you might want to set the slow query threshold to 0 and write some code to convert literal values back into anonymous representations for the queries written to the logs).

Believe me, I've done this before and after building a cache of prepared statements the performance gain was very noticeable - see this question: Preparing SQL Statements with PDO.
An this was the code I came up after, with cached prepared statements:
function DB($query)
{
static $db = null;
static $result = array();
if (is_null($db) === true)
{
$db = new PDO('sqlite:' . $query, null, null, array(PDO::ATTR_ERRMODE => PDO::ERRMODE_WARNING));
}
else if (is_a($db, 'PDO') === true)
{
$hash = md5($query);
if (empty($result[$hash]) === true)
{
$result[$hash] = $db->prepare($query);
}
if (is_a($result[$hash], 'PDOStatement') === true)
{
if ($result[$hash]->execute(array_slice(func_get_args(), 1)) === true)
{
if (stripos($query, 'INSERT') === 0)
{
return $db->lastInsertId();
}
else if (stripos($query, 'SELECT') === 0)
{
return $result[$hash]->fetchAll(PDO::FETCH_ASSOC);
}
else if ((stripos($query, 'UPDATE') === 0) || (stripos($query, 'DELETE') === 0))
{
return $result[$hash]->rowCount();
}
else if (stripos($query, 'REPLACE') === 0)
{
}
return true;
}
}
return false;
}
}
Since I don't need to worry about collisions in queries, I've ended up using md5() instead of sha1().

OK, since I've been bashing methods of keying the queries for the cache, other than simply using the query string itself, I've done a naive benchmark. The following compares using the plain query string vs first creating the md5 hash:
$ php -v
$ PHP 5.3.0-3 with Suhosin-Patch (cli) (built: Aug 26 2009 08:01:52)
$ ...
$ php benchmark.php
$ PHP hashing: 0.19465494155884 [microtime]
$ MD5 hashing: 0.57781004905701 [microtime]
$ 799994
The code:
<?php
error_reporting(E_ALL);
$queries = array("SELECT",
"INSERT",
"UPDATE",
"DELETE",
);
$query_length = 256;
$num_queries = 256;
$iter = 10000;
for ($i = 0; $i < $num_queries; $i++) {
$q = implode('',
array_map("chr",
array_map("rand",
array_fill(0, $query_length, ord("a")),
array_fill(0, $query_length, ord("z")))));
$queries[] = $q;
}
echo count($queries), "\n";
$cache = array();
$side_effect1 = 0;
$t = microtime(true);
for ($i = 0; $i < $iter; $i++) {
foreach ($queries as $q) {
if (!isset($cache[$q])) {
$cache[$q] = $q;
}
else {
$side_effect1++;
}
}
}
echo microtime(true) - $t, "\n";
$cache = array();
$side_effect2 = 0;
$t = microtime(true);
for ($i = 0; $i < $iter; $i++) {
foreach ($queries as $q) {
$md5 = md5($q);
if (!isset($cache[$md5])) {
$cache[$md5] = $q;
}
else {
$side_effect2++;
}
}
}
echo microtime(true) - $t, "\n";
echo $side_effect1 + $side_effect2, "\n";

To my knowledge PDO does not reuse already prepared statements as it does not analyse the query by itself so it does not know if it is the same query.
If you want to create a cache of prepared queries, the simplest way imho would be to md5-hash the query string and generate a lookup table.
OTOH: How many queries are you executing (per minute)? If less than a few hundred then you only complicate the code, the performance gain will be minor.

Using a MD5 hash as a key you could eventually get two queries that result in the same MD5 hash. The probability is not high, but it could happen. Don't do it. Lossful hashing algorithms like MD5 is just ment as a way to tell if two objects are different with high certainty, but are not a safe means of identifying something.

Related

Get data from database with MySQL in forloop

I am a newbie in MySQL and PHP.
I have the following code to get data within a date range (day 1 to day 2, then day 2 to day 3 and so on).
function getData($query) {
global $connect;
$result = mysqli_query($connect, $query);
if (!$result) {
echo 'MySQL Error: '.mysqli_error($connect);
die();
}
return mysqli_fetch_assoc($result);
}
$dayZero = date_create('2017-01-21');
$dayToday = date_create('Y-m-d');
$diff = date_diff($dayZero, $dayToday)->format('%a');
for ($i = 0; $i < $diff; ++$i) {
$start[$i] = date('Y-m-d', date_format($dayZero, 'U') + (24*60*60)*($i));
$end[$i] = date('Y-m-d', date_format($dayZero, 'U') + (24*60*60)*($i+1));
$days[$i] = getData('SELECT count(*) AS "b" FROM `table_name` WHERE `timestamp` BETWEEN "'.$start[$i].'" AND "'.$end[$i].'"')['b'];
}
The code works as expected, but it runs extremely slow. My guess is because it needs to check the database each time it loops.
Is there a way to make it runs faster? Or is there any optimization that I can make?
Yes! Great question. While you can execute queries as you have done, the better option is to use prepared statements. This separates the query into a prepared statement and it's variables see here:
http://www.w3schools.com/php/php_mysql_prepared_statements.asp
The actual statement or query is sent to the server one time. After this the server waits for you to supply the variables.
This is great for performance applications (like yours), where the server is able to make use of caching to greatly speed up the performance. It is also the preferred method for secure applications where there server is protected from injection attacks.
As a final note, there are a bunch of ways to optimize SQL queries and this is just one of them. You should really always be using prepared statements though.

PHP: MySQL Select Query vs String Operation

I have a database and a string with about 100,000 key / value-pair records and I want to create a function to find the value.
Is it better to use a string or a database, considering performance (page load time) and crash safety? Here are my two code examples:
1.
echo find("Mohammad");
function find($value){
$sql = mysql_query("SELECT * FROM `table` WHERE `name`='$value' LIMIT 1");
$count = mysql_num_rows($sql);
if($count > 0){
$row = mysql_fetch_array($sql);
return $row["family"];
} else {
return 'NULL';
}
}
2.
$string = "Ali:Golam,Mohammad:Offer,Reza:Now,Saber:Yes";
echo find($string,"Mohammad");
function find($string,$value){
$array = explode(",",$string);
foreach($array as $into) {
$new = explode(":",$into);
if($new[0] == $value) {
return $new[1];
break;
}
}
}
The database is pretty sure a good idea.
Databases are fast, maybe they are not as fast as basic String operations in PHP, but if there is a lot of data, databases will probably be faster. A basic select Query takes (on my current default Desktop Hardware) about 15ms, cached less than 1ms, and this is pretty much independend of the number of names in your table, if the indexes are correct. So your site will always be fast.
Databases won't cause a StackOverflow or an out of memory error and crash your site (this is very depending on your PHP-Settings and Hardware)
Databases are more flexible, imagine you want to add / remove / edit names after creating the first Data-Set. It's very simple with "INSERT INTO", "DELETE FROM" and "UPDATE" to modify the data, better than editing a string somewhere in your code with about 100.000 entries.
Your Code
You definitly need to use MySQLi or PDO instead, code maybe like this:
$mysqli = new mysqli("host", "username", "password", "dbname");
$stmt = $mysqli->prepare(
'SELECT string FROM table WHERE name = ? LIMIT 1'
);
$stmt->bind_param("s", $value);
$stmt->execute();
$stmt->store_result();
$stmt->bind_result($string);
$stmt->fetch();
$stmt->close();
This uses MySQLi and Prepared Statements (for security), instead of the depracated MySQL extension (in PHP 5.5)

MySQLi loop query to variables

Good evening all.
I'm currently working on a small personal project. It's purpose is to retrieve numerous values from a database on my backend and store them as variables. These variables are then used to modify the appearance of some HTML5 Canvas objects (in this case, i'm using arcs).
Please note that the values in the database are Text and thus my bind statements refer to that. The queries i'm calling (AVG, MIN, MAX) work fine with the values i've got as the fields store numerical data (this is merely due to another script that deals with adding or updating the data -- that's already running MySQLi, and using Text was the best solution for my situation).
Now, i achieved what i wanted with standard MySQL queries, but it's messy code and the performance of it could prove to be terrible as the database grows. For that reason, i want to use loops. I also feel that bind_param of MySQLi would be much better for security. The page doesn't accept ANY user input, it's merely for display and so injection is less of a concern, but at some point in the future, i'll be looking to expand it to allow users to control what is displayed.
Here's a sample of my original MySQL PHP code sample;
$T0A = mysql_query('SELECT AVG(Temp0) FROM VTempStats'); // Average
$T0B = mysql_query('SELECT MIN(Temp0) FROM VTempStats'); // Bottom/MIN
$T0T = mysql_query('SELECT MAX(Temp0) FROM VTempStats'); // Top/MAX
$T1A = mysql_query('SELECT AVG(Temp1) FROM VTempStats'); // Average
$T1B = mysql_query('SELECT MIN(Temp1) FROM VTempStats'); // Bottom/MIN
$T1T = mysql_query('SELECT MAX(Temp1) FROM VTempStats'); // Top/MAX
$r_T0A = mysql_result($T0A, 0);
$r_T0T = mysql_result($T0T, 0);
$r_T0B = mysql_result($T0B, 0);
$r_T1A = mysql_result($T1A, 0);
$r_T1T = mysql_result($T1T, 0);
$r_T1B = mysql_result($T1B, 0);
if ($r_T0A == "" ) {$r_T0A = 0;}
if ($r_T1A == "" ) {$r_T1A = 0;}
if ($r_T0B == "" ) {$r_T0B = 0;}
if ($r_T1B == "" ) {$r_T1B = 0;}
if ($r_T0T == "" ) {$r_T0T = 0;}
if ($r_T1T == "" ) {$r_T1T = 0;}
That's shorter than the original, as there's 4x3 sets of queries (Temp0,Temp1,Temp2,Temp3, and min,max,avg for each). Note that the last 6 if statements are merely there to ensure that fields that are null are automatically set to 0 before my canvas script attempts to work with them (see below).
To show that value on the arc, i'd use this in my canvas script (for example);
var endAngle = startAngle + (<?= $r_T0A ?> / 36+0.02);
It worked for me, and what was displayed was exactly what i expected.
Now, in trying to clean up my code and move to a loop and MySQLi, i'm running into problems. Being very new to both SQL and PHP, i could use some assistance.
This is what i tried;
$q_avg = "SELECT AVG(Temp?) FROM VTempStats";
for ($i_avg = 0; $i_avg <= 3; ++$i_avg)
{
if ($s_avg = $mysqli->prepare($q_avg))
{
$s_avg->bind_param('s',$i_avg);
$s_avg->execute();
$s_avg->bind_result($avg);
$s_avg->fetch();
echo $avg;
}
}
Note: mysqli is the MySQLi connection. I've cut the code down to only show the AVG query loop, but the MIN and MAX loops are nearly identical.
Obviously, that won't work as it's only assigning one variable for each set of queries, instead of 4 variables for each loop.
As you can imagine, what i want to do is assign all 12 values to individual variables so that i can work with them in my canvas script. I'm not entirely sure how i go about this though.
I can echo individual values out through MySQLi, or i can query the database to change or add data through MySQLi, but trying to make a loop that does what i intend with MySQLi (or even MySQL), that's something i need help with.
From my reading of your code, you have a fixed number of columns and know their names, and you are applying the AVG(), MIN(), MAX() aggregates to the same table over the same aggregate group, with no WHERE clause applied. Therefore, they can all be done in one query from which you just need to fetch one single row.
SELECT
AVG(Temp0) AS a0,
MIN(Temp0) AS min0,
MAX(Temp0) AS max0,
AVG(Temp1) AS a1,
MIN(Temp1) AS min1,
MAX(Temp1) AS max1,
AVG(Temp2) AS a2,
MIN(Temp2) AS min2,
MAX(Temp2) AS max2,
AVG(Temp3) AS a3,
MIN(Temp3) AS min3,
MAX(Temp3) AS max3
FROM VTempStats
This can be done in a single call to $mysqli->query(), and no parameter binding is necessary so you don't need the overhead of prepare(). One call to fetch_assoc() is needed to retrieve a single row, with columns aliased like a0, min0, max0, etc... as I have done above.
// Fetch one row
$values = $result_resource->fetch_assoc();
print_r($values);
printf("Avg 0: %s, Min 0: %s, Max 0: %s... etc....", $values['a0'], $values['min0'], $values['max0']);
These can be pulled into the global scope with extract(), but I recommend against that. Keeping them in their $values array makes their source more explicit.
As you can imagine, what i want to do is assign all 12 values to individual variables so that i can work with them in my canvas script. I'm not entirely sure how i go about this though.
Understood. Here is what I would do.
<?php // RAY_temp_scottprichard.php
error_reporting(E_ALL);
echo '<pre>';
// RANGE OF TEMPS
$temps = range(0,3);
// RANGE OF VALUES
$funcs = array
( 'A' => 'AVG'
, 'B' => 'MIN'
, 'T' => 'MAX'
)
;
// CONSTRUCT THE QUERY STRING
$query = 'SELECT ';
foreach ($temps as $t)
{
foreach ($funcs as $key => $func)
{
$query .= PHP_EOL
. $func
. '(Temp'
. $t
. ') AS '
. 'T'
. $t
. $key
. ', '
;
}
}
// DECLOP THE UNWANTED TRAILING COMMA
$query = rtrim($query, ', ');
// ADD THE TABLE NAME
$query .= ' FROM VTempStats';
// ADD ANY ORDER, LIMIT, WHERE CLAUSES HERE
$query .= ' WHERE 1=1';
// SHOW THE WORK PRODUCT
var_dump($query);
See the output query string here: http://www.laprbass.com/RAY_temp_scottpritchard.php
When you run this query, you will fetch one row with *mysql_fetch_assoc()* or equivalent, and it will have all the variables you want in that row, with named keys. Then you can use something like this to inject the variable names and values into your script. http://php.net/manual/en/function.extract.php
PHP extract() allows the use of a prefix, so you should be able to avoid having to make too many changes to your existing script.
HTH, ~Ray

mysql multi_query intermittently fails

function cpanel_populate_database($dbname)
{
// populate database
$sql = file_get_contents(dirname(__FILE__) . '/PHP-Point-Of-Sale/database/database.sql');
$mysqli->multi_query($sql);
$mysqli->close();
}
The sql file is a direct export from phpMyAdmin and about 95% of the time runs without issue and all the tables are created and data is inserted. (I am creating a database from scratch)
The other 5% only the first table or sometimes the first 4 tables are created, but none of the other tables are created (there are 30 tables).
I have decided to NOT use multi_query because it seems buggy and see if the the bug occurs by using just mysql_query on each line after semi-colon. Has anyone ran into issue's like this?
Fast and effective
system('mysql -h #username# -u #username# -p #database# < #dump_file#');
I've seen similar issues when using multi_query with queries that can create or alter tables. In particular, I tend to get InnoDB 1005 errors that seem to be related to foreign keys; it's like MySQL doesn't completely finish one statement before moving on to the next, so the foreign keys lack a proper referent.
In one system, I split the problematic statements into their own files. In another, I have indeed run each command separately, splitting on semicolons:
function load_sql_file($basename, $db) {
// Todo: Trim comments from the end of a line
log_upgrade("Attempting to run the `$basename` upgrade.");
$filename = dirname(__FILE__)."/sql/$basename.sql";
if (!file_exists($filename)) {
log_upgrade("Upgrade file `$filename` does not exist.");
return false;
}
$file_content = file($filename);
$query = '';
foreach ($file_content as $sql_line) {
$tsl = trim($sql_line);
if ($sql_line and (substr($tsl, 0, 2) != '--') and (substr($tsl, 0, 1) != '#')) {
$query .= $sql_line;
if (substr($tsl, -1) == ';') {
set_time_limit(300);
$sql = trim($query, "\0.. ;");
$result = $db->execute($sql);
if (!$result) {
log_upgrade("Failure in `$basename` upgrade:\n$sql");
if ($error = $db->lastError()) {
log_upgrade("$error");
}
return false;
}
$query = '';
}
}
}
$remainder = trim($query);
if ($remainder) {
log_upgrade("Trailing text in `$basename` upgrade:\n$remainder");
if (DEBUG) trigger_error('Trailing text in upgrade script: '.$remainder, E_USER_WARNING);
return false;
}
log_upgrade("`$basename` upgrade successful.");
return true;
}
I have never resorted to multi-query. When I needed something like that, I moved over to mysqli. Also, if you do not need any results from the query, passing the script to mysql_query will also work. You'll also get those errors if there are exports in an incorrect order that clash with require tables for foreign keys and others.
I think the approach of breaking the SQL file to single-queries would be a good idea. Even if its just for comparison purposes (to see if it solves the issue).
Also, I'm not sure how big is your file - but I've had a couple of cases where the file was incredibly big and splitting it into batches did the job.

Efficient way to look up value based on a key in php [duplicate]

This question already has answers here:
How to find array / dictionary value using key?
(2 answers)
Closed 1 year ago.
With a list of around 100,000 key/value pairs (both string, mostly around 5-20 characters each) I am looking for a way to efficiently find the value for a given key.
This needs to be done in a php website. I am familiar with hash tables in java (which is probally what I would do if working in java) but am new to php.
I am looking for tips on how I should store this list (in a text file or in a database?) and search this list.
The list would have to be updated occasionally but I am mostly interested in look up time.
You could do it as a straight PHP array, but Sqlite is going to be your best bet for speed and convenience if it is available.
PHP array
Just store everything in a php file like this:
<?php
return array(
'key1'=>'value1',
'key2'=>'value2',
// snip
'key100000'=>'value100000',
);
Then you can access it like this:
<?php
$s = microtime(true); // gets the start time for benchmarking
$data = require('data.php');
echo $data['key2'];
var_dump(microtime(true)-$s); // dumps the execution time
Not the most efficient thing in the world, but it's going to work. It takes 0.1 seconds on my machine.
Sqlite
PHP should come with sqlite enabled, which will work great for this kind of thing.
This script will create a database for you from start to finish with similar characteristics to the dataset you describe in the question:
<?php
// this will *create* data.sqlite if it does not exist. Make sure "/data"
// is writable and *not* publicly accessible.
// the ATTR_ERRMODE bit at the end is useful as it forces PDO to throw an
// exception when you make a mistake, rather than internally storing an
// error code and waiting for you to retrieve it.
$pdo = new PDO('sqlite:'.dirname(__FILE__).'/data/data.sqlite', null, null, array(PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION));
// create the table if you need to
$pdo->exec("CREATE TABLE stuff(id TEXT PRIMARY KEY, value TEXT)");
// insert the data
$stmt = $pdo->prepare('INSERT INTO stuff(id, value) VALUES(:id, :value)');
$id = null;
$value = null;
// this binds the variables by reference so you can re-use the prepared statement
$stmt->bindParam(':id', $id);
$stmt->bindParam(':value', $value);
// insert some data (in this case it's just dummy data)
for ($i=0; $i<100000; $i++) {
$id = $i;
$value = 'value'.$i;
$stmt->execute();
}
And then to use the values:
<?php
$s = microtime(true);
$pdo = new PDO('sqlite:'.dirname(__FILE__).'/data/data.sqlite', null, null, array(PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION));
$stmt = $pdo->prepare("SELECT * FROM stuff WHERE id=:id");
$stmt->bindValue(':id', 5);
$stmt->execute();
$value = $stmt->fetchColumn(1);
var_dump($value);
// the number of seconds it took to do the lookup
var_dump(microtime(true)-$s);
This one is waaaay faster. 0.0009 seconds on my machine.
MySQL
You could also use MySQL for this instead of Sqlite, but if it's just one table with the characteristics you describe, it's probably going to be overkill. The above Sqlite example will work fine using MySQL if you have a MySQL server available to you. Just change the line that instantiates PDO to this:
$pdo = new PDO('mysql:host=your.host;dbname=your_db', 'user', 'password', array(PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION));
The queries in the sqlite example should all work fine with MySQL, but please note that I haven't tested this.
Let's get a bit crazy: Filesystem madness
Not that the Sqlite solution is slow (0.0009 seconds!), but this about four times faster on my machine. Also, Sqlite may not be available, setting up MySQL might be out of the question, etc.
In this case, you can also use the file system:
<?php
$s = microtime(true); // more hack benchmarking
class FileCache
{
protected $basePath;
public function __construct($basePath)
{
$this->basePath = $basePath;
}
public function add($key, $value)
{
$path = $this->getPath($key);
file_put_contents($path, $value);
}
public function get($key)
{
$path = $this->getPath($key);
return file_get_contents($path);
}
public function getPath($key)
{
$split = 3;
$key = md5($key);
if (!is_writable($this->basePath)) {
throw new Exception("Base path '{$this->basePath}' was not writable");
}
$path = array();
for ($i=0; $i<$split; $i++) {
$path[] = $key[$i];
}
$dir = $this->basePath.'/'.implode('/', $path);
if (!file_exists($dir)) {
mkdir($dir, 0777, true);
}
return $dir.'/'.substr($key, $split);
}
}
$fc = new FileCache('/tmp/foo');
/*
// use this crap for generating a test example. it's slow to create though.
for ($i=0;$i<100000;$i++) {
$fc->add('key'.$i, 'value'.$i);
}
//*/
echo $fc->get('key1', 'value1');
var_dump(microtime(true)-$s);
This one takes 0.0002 seconds for a lookup on my machine. This also has the benefit of being reasonably constant regardless of the cache size.
It depends on how frequent you would access your array, think it this way how many users can access it at same time.There are many advantages towards storing it in database and here you have two options MySQL and SQLite.
SQLite works more like text file with SQL support, you can save few milliseconds during queries as it located within reach of your application, the main disadvantage of it that it can add only one record at a time (same as text file).
I would recommend SQLite for arrays with static content like GEO IP data, translations etc.
MySQL is more powerful solution but require authentication and located on separate machine.
PHP arrays will do everything you need. But shouldn't that much data be stored in a database?
http://php.net/array

Categories