Recently, I have been moving all of my 'mysql specific' functions into a class, just so if in the future I decided to switch DB languages, I woudn't have to correct all of my scripts, but just one class.
My question is about efficiency and speed on large data.
For example, now in my database.class.php I have:
public function fetchAssociative($data) {
$assocarr = array();
while ($row = mysqli_fetch_assoc($data)){
array_push($assocarr, $row);
}
return $assocarr;
}
which then gets called in my script.
In compare to when I just use
while ($row = mysqli_fetch_assoc($result)) {
//operate on returned data ONE ROW AT A TIME
}
right in my script.
Now as I mentioned before, it is convenient to have all mysql specific functions in one class, but what about speed efficiency? In this example my script has to wait for all the data to get returned before doing any operations where as before (using while ($row = mysqli_fetch_assoc($result))) operations where done one at a time on each row returned.
TLDR: How are the efficiency and speed affected if I push mysql specific functions back into a class?
My question is about efficiency and speed on large data.
Then you should worry about SQL queries and indexing.
Not PHP classes or PHP functions.. 98% off performance issues with MySQL and Big Data (read large datasets) are caused by inefficient SQL queries like subqueries, IN/EXISTS statements no indexing, wrong indexing, using joins without indexes and not understanding the MySQL engine..
You should have two functions, not one.
public function fetch($result, $method = MYSQLI_ASSOC) {
return mysqli_fetch_array($result, $method);
}
public function fetchAll($result, $method = MYSQLI_ASSOC) {
$ret = array();
while (mysqli_fetch_array($result, $method)){
$ret[] = $row;
}
return $ret;
}
in case you need all your data at once - call fetchAll()
in case you need only one row or there are too much rows returned - use fetch() once/in a while loop respectively.
Related
Background
I have been in the process of developing a Content Management System from scratch. It is primarily being developed from scratch for experience reasons, but it will also be used down the road.
I developed a DatabaseManager class which extends the mysqli class in PHP. I am using it do MySQL queries and connections. A function I have developed passes both the SQL query string (which is parameterized) followed by an array of the correct values to be replaced.
TL;DR
Long story short, I'm using a ReflectionClass to bind the parameters to the SQL query and execute. I am curious if this is a secure approach, or is there another method that would be best seen fit?
The Method
This is the DatabaseManager::do_query() method:
function do_query($sql, $values){
if(!isset($this->connect_error)){
$num_vals = count($values);
$i = 0;
$type = "";
while($i < $num_vals){
if(is_int($values[$i]) == true)
$type .= "i";
elseif(is_string($values[$i]) == true)
$type .= "s";
$i++;
}
$i = 0;
while($i < $num_vals){
$values2[$i] = &$values[$i];
$i++;
}
$values2 = array_merge(array($type), $values2);
$expr = $this->prepare($sql);
if($expr != false){
$ref = new ReflectionClass('mysqli_stmt');
$method = $ref->getMethod("bind_param");
$method->invokeArgs($expr, $values2);
$expr->execute();
return "Success";
}else{
$error_string = "Error: Query preparation resulted in an error. ";
$error_string .= $this->error;
__TGErrorHandler(TG_ERROR_DB_PREP);
return $error_string;
}
}
}
I have not run into any direct errors by testing, and it seems to hold up against SQL injections, using prepared statements. Is there any underlying issues with the structure of this method, though?
P.S. I am handling SELECT statements in a different way. This will handle DELETE and INSERT statements primarily.
First of all, you're on the right track. Only few, may be one out of hundred PHP users even come to the idea of using such a useful function.
Next, Reflection is just superfluous for this task, call_user_func_array() is a way to go. Even though it's a little tricky, it's a straightforward way, while Reflection is not.
Finally, to the security. I would remove automatic type detection and bind all the parameters as strings. It won't do any harm, while with your current approach there is a chance that database will return you bizarre results, if you happen to compare a number with a string from database.
Error handling in this function is questionable too.
There's nothing inherently insecure about using reflection here, but it's completely unnecessary. You can accomplish the exact same thing without reflection using call_user_func_array:
if ($expr !== FALSE) {
call_user_func_array([$expr, "bind_param"], $values2);
$expr->execute();
...
Alternatively, consider using PDO instead of mysqli. PDOStatement::execute() can take an array of bind parameters as an argument. (Also, using PDO means your code may be portable to databases other than MySQL!)
After some research, I have been sold on the idea of generators (more generally, iterators) for lots of tasks that would normally buffer results into an array, since the memory usage is O(1) instead of O(n).
So I plan to use generators to handle database results queried via mysqli. I have 2 questions regarding this approach that I have not been able to find answers on, and I am hoping the community can give me some creative solutions:
Is there a way to release resources opened by a generator, if the consuming code chooses to not fully iterate the results? Using an Iterator class, one might do this in the __desctruct method. But, from my tests, a generator will simply not execute code following an iteration sequence if it does not conclude naturally. I am looking for workarounds to this that will prevent having to create an Iterator subclass. See code below.
Does using a generator or iterator even provide any benefit for database results? Some of my snooping seemed to indicate that mysqli might be loading the resultset into memory anyway (MYSQLI_STORE_RESULT), defeating the purpose of an iterator. If the results are not buffered, I am curious whether multiple queries can be executed while their resultsets are being iterated (fetched) at the same time (think nested loops where you are iterating over a set of items and then query for child items for each parent). This seems like the database cursor might lock during the entire iteration.
Example
The below is a simplification of what I mean by cleanup. From my tests, the result only gets freed if the entire result is iterated. If there is an exception or break in the consuming loop, the results never get freed. Perhaps I am overthinking this and the garbage collector is good enough?
function query($mysqli, $sql){
$result = $mysqli->query($sql);
foreach($result as $row){
yield $row;
}
$result->free(); //Never reached if break, exception, take first n rows, etc.
}
tl;dr is I am just curious how to free resources used by a generator, and subsequently if generators for database access really saves anything, or if the results are buffered anyway
UPDATE
It looks here (http://www.php.net/manual/en/mysqlinfo.concepts.buffering.php) like PHP buffers queries by default, possibly defeating the purpose of generators. Although the argument could be made that buffering only one array is better than creating a copy of the buffered array and then having two buffered sets.
I am looking for anyone with experience in the matter to weigh in. Your thoughts are appreciated!
I may be a little late to the party, but if you are using generators and need to clean up when finished (say you break you parent loop before you are finished looping through everything), you can just use a try/catch/finally with the cleanup in the finally block:
function query($mysqli, $sql) {
$result = $mysqli->query($sql);
try {
if ($result) {
foreach($result as $row) {
yield $row;
}
}
} catch (Exception $e) {
throw $e; // send this up the stack (or you could handle here)
} finally {
$result->free(); // clean up when the loop is finished.
}
}
Here's how to detect loop breaks, and how to handle or cleanup after an interruption.
function generator()
{
$complete = false;
try {
while (($result = some_function())) {
yield $result;
}
$complete = true;
} finally {
if (!$complete) {
// cleanup when loop breaks
} else {
// cleanup when loop completes
}
}
// Do something only after loop completes
}
function query($mysqli, $sql){
$result = $mysqli->query($sql);
foreach($result as $i => $row) {
if ($i + 1 === $result->num_rows) {
$result->free();
}
yield $row;
}
}
Which one of these two is better in my case?
While loop:
function search_hotel($searchterm)
{
$query = $this->db->order_by("id", "desc")->like('name', $searchterm)->get('hotel_submits');
$data = array();
while($row = mysql_fetch_array($query))
{
$data[] = $row->name;
}
return $data;
}
Foreach loop:
function search_hotel($searchterm)
{
$query = $this->db->order_by("id", "desc")->like('name', $searchterm)->get('hotel_submits');
$data = array();
foreach ($query->result() as $row)
{
$data[] = $row->name;
}
return $data;
//return mysql_query("select * from hotel_submits where name LIKE '".$searchterm."'");
}
while is technically more efficient than foreach, but it's not worth comparing: they're both pretty much identical in this case.
In your case, result returned by query is array. Which means you can use foreach statement or while statement.
Just note that foreach statement is optimized for working with arrays (and as of PHP5, objects as well) and is faster than while statement. While can be used to achieve the same effect but it is not as efficient if you want to go through all elements of the array.
When using a framework and its custom DB adapter class, it seems pointless to switch back to PHP's built-in functions in the middle of a script. Even if CI's adapter and PHP's mysql_* functions might be using the same DBMS connection library (mysql).
I strongly recommend to stick with Code Igniter's version (foreach ($query->result() as $row)). From a performance point of view, there shouldn't be any noticeable differences. Regarding the application architecture, it certainly is much cleaner not to mix the access interfaces. Although it might work out, it might also cause problems.
It would appear you're mixing up mysqli and mysql syntax. The two libraries are NOT compatible internally. You cannot use a handle/statement in one and consume it in another. Both libraries maintain completely independent connections to the database.
That'd mean the first one will be faster, since mysql_fetch_array() will fail and the inner loop will never run. But faster doesn't mean "right".
i got this code
while ($aResult = mysql_fetch_array($result))
{
$sResult[$aResult[userID]] = $aResult;
}
but i want to know is there a faster way to put everything from $aresult in the sresult global?
It depends on the definition of "faster". If you want minimum CPU usage, you propably should use the function above, or as Col. said, try to avoid fetching all and use pointers.
If you want less coding time, consider using a a wrapper such as PEAR DB. With it you can just write $res = $db->getAll( $SQL );
http://pear.php.net/manual/en/package.database.db.db-common.getall.php
I am aware of that All associated result memory is automatically freed at the end of the script's execution. But would you recommend using it, if I am using a quite of lot of somewhat similar actions as below?
$sql = "select * from products";
$result = mysql_query($sql);
if($result && mysql_num_rows($result) > 0) {
while($data = mysql_fetch_assoc($result)) {
$sql2 = "insert into another_table set product_id = '".$data['product_id']."'
, product_name = '".$data['product_name']."'
";
$result2 = mysql_query($sql2);
**mysql_free_result($result2);**
}
}
Thanks.
Quoting the documentation of mysql_free_result :
mysql_free_result() only needs to be
called if you are concerned about how
much memory is being used for queries
that return large result sets. All
associated result memory is
automatically freed at the end of the
script's execution.
So, if the documentation says it's generally not necessary to call that function, I would say it's not really necessary, nor good practice, to call it ;-)
And, just to say : I almost never call that function myself ; memory is freed at the end of the script, and each script should not eat too much memory.
An exception could be long-running batches that have to deal with large amounts of data, though...
Yes, it is good practice to use mysql_free_result($result). The quoted documentation in the accepted answer is inaccurate. That is what the documentation says, but that doesn't make any sense. Here is what it says:
mysql_free_result() only needs to be called if you are concerned about how much memory is being used for queries that return large result sets. All associated result memory is automatically freed at the end of the script's execution.
The first part of the first sentence is correct. It is true that you don't need to use it for reasons other than memory concerns. Memory concerns are the only reason to use it. However, the second part of the first sentence doesn't make any sense. The claim is that you would only be concerned about memory for queries that return large result sets. This is very misleading as there are other common scenarios where memory is a concern and calling mysql_free_result() is very good practice. Any time queries may be run an unknown number of times, more and more memory will be used up if you don't call mysql_free_result(). So if you run your query in a loop, or from a function or method, it is usually a good idea to call mysql_free_result(). You just have to be careful not to free the result until after it will not be used anymore. You can shield yourself from having to think about when and how to use it by making your own select() and ex() functions so you are not working directly with result sets. (None of the code here is exactly the way I would actually write it, it is more illustrative. You may want to put these in a class or special namespace, and throw a different Exception type, or take additional parameters like $class_name, etc.)
// call this for select queries that do not modify anything
function select($sql) {
$array= array();
$rs= query($sql);
while($o= mysql_fetch_object($rs))
$array[]= $o;
mysql_free_result($rs);
return $array;
}
// call this for queries that modify data
function ex($sql) {
query($sql);
return mysql_affected_rows();
}
function query($sql) {
$rs= mysql_query($sql);
if($rs === false) {
throw new Exception("MySQL query error - SQL: \"$sql\" - Error Number: "
.mysql_errno()." - Error Message: ".mysql_error());
}
return $rs;
}
Now if you only call select() and ex(), you are just dealing with normal object variables and only normal memory concerns instead of manual memory management. You still have to think about normal memory concerns like how much memory is in use by the array of objects. After the variable goes out of scope, or you manually set it to null, it become available for garbage collection so PHP takes care of that for you. You may still want to set it to null before it goes out of scope if your code does not use it anymore and there are operations following it that use up an unknown amount of memory such as loops and other function calls. I don't know how result sets and functions operating on them are implemented under the hood (and even if I did, this could change with different/future versions of PHP and MySQL), so there is the possibility that the select() function approximately doubles the amount of memory used just before mysql_free_result($rs) is called. However using select() still eliminates what us usually the primary concern of more and more memory being used during loops and various function calls. If you are concerned about this potential for double memory usage, and you are only working with one row at a time over a single iteration, you can make an each() function that will not double your memory usage, and will still shield you from thinking about mysql_free_result():
each($sql,$fun) {
$rs= query($sql);
while($o= mysql_fetch_object($rs))
$fun($o);
mysql_free_result($rs);
}
You can use it like this:
each("SELECT * FROM users", function($user) {
echo $user->username."<BR>";
});
Another advantage of using each() is that it does not return anything, so you don't have to think about whether or not to set the return value to null later.
The answer is of course YES in mysqli.
Take a look at PHP mysqli_free_result documentation:
You should always free your result with mysqli_free_result(), when your result object is not needed anymore.
I used to test it with memory_get_usage function:
echo '<br>before mysqli free result: '.memory_get_usage();
mysqli_free_result($query[1]);
echo '<br>after mysqli free result'.memory_get_usage();
And it is the result:
before mysqli free result:2110088
after mysqli free result:1958744
And here, we are talking about 151,344 bytes of memory in only 1000 rows of mysql table.
How about a million rows and how is it to think about large projects?
mysqli_free_result() is not only for large amount of data, it is also a good practice for small projects.
It depends on how large your queries are or how many queries you run.
PHP frees the memory at the end of the script(s) automatically, but not during the run. So if you have a large amount of data comming from a query, better free the result manually.
I would say: YES, it is good practice because you care about memory during the development or your scripts and that is what makes a good developer :-)